Table 1. Simulated Cell Types
The learned control strategies
correlate with intuition for these extreme edge cases. In the base case
of Cell 1, to protect the cells from overexposure it removes the beads
on the second step after adding on the first. The intuitive strategy
would be to add the beads in the initial steps and let most of the naïve
cells convert and remove the beads when most cells are activated and let
them proliferate and increase in number which is what the agent executes
with less beads after step 5. With Cell type 2, that has lower rate of
exhaustion than the base case, we observe the agent ramps up number of
beads quicker and maintains a near constant level of exposure until the
end when there is another ramp to activate any remaining naïve cells
(Figure 4b). It is interesting to note that in this case, the first
steps of the agent (the initial ramp) are very decisive, with no
deviation amongst all runs. Afterwards there are variations in bead
number with agent taking decision as required to convert the remaining
naïve cells. In cell type 3, we simulate a cell that has a higher rate
of natural exhaustion. As exhaustion is only applicable to active cells,
the obvious strategy would be to make some deliberate delay in adding
the beads, to convert the cells close to the end of the episode.
However, as regeneration will be high the whole region will be crowded
with activated cells so it would be imperative to remove beads and wait
for all of them to regenerate as soon as the optimal number of cells get
activated. Considering both cases the best strategy would be to add
beads in the middle steps and skip the beginning and end steps. This is
reflected in the learned strategy of the agent, it skips the first two
steps, then adds the beads in two repeated steps, then takes out all the
beads and waits to make the cells increase in number. With cell type 4,
asymmetric regeneration is simulated where an activated cell can produce
both activated and naïve cells.