Table 1. Simulated Cell Types
The learned control strategies correlate with intuition for these extreme edge cases. In the base case of Cell 1, to protect the cells from overexposure it removes the beads on the second step after adding on the first. The intuitive strategy would be to add the beads in the initial steps and let most of the naïve cells convert and remove the beads when most cells are activated and let them proliferate and increase in number which is what the agent executes with less beads after step 5. With Cell type 2, that has lower rate of exhaustion than the base case, we observe the agent ramps up number of beads quicker and maintains a near constant level of exposure until the end when there is another ramp to activate any remaining naïve cells (Figure 4b). It is interesting to note that in this case, the first steps of the agent (the initial ramp) are very decisive, with no deviation amongst all runs. Afterwards there are variations in bead number with agent taking decision as required to convert the remaining naïve cells. In cell type 3, we simulate a cell that has a higher rate of natural exhaustion. As exhaustion is only applicable to active cells, the obvious strategy would be to make some deliberate delay in adding the beads, to convert the cells close to the end of the episode. However, as regeneration will be high the whole region will be crowded with activated cells so it would be imperative to remove beads and wait for all of them to regenerate as soon as the optimal number of cells get activated. Considering both cases the best strategy would be to add beads in the middle steps and skip the beginning and end steps. This is reflected in the learned strategy of the agent, it skips the first two steps, then adds the beads in two repeated steps, then takes out all the beads and waits to make the cells increase in number. With cell type 4, asymmetric regeneration is simulated where an activated cell can produce both activated and naïve cells.