Figure 4: Change of
strategy by the agent using 20 control steps for different cell types.
(a) Simulation process to obtain control strategy information (b)
Strategy of the agent visualized by average number of beads at each
control step (y and x axes respectively). Error bar indicates the
standard deviation of beads used at that control step – indication of
simulation variability or constancy (where there are no bars). The
learning curve is also attached with each bar plot, axes same as Figure
3. Arrows between plots indicate the change in cell type (also see Table
1).
To convert the newly produced naïve cells, beads are required, but those
same beads cause the activated cells to get exhausted. To navigate this
system the agent alternately adds and removes beads, and the overall end
score is lower than the other cell types.
To test the effect of an agent that has more control over the
environment, we repeat the training process with 50 control steps
(interacting with the growth vessel every 3.2 hr instead of 8 hr – see
justification in Supplement 8) for six cell types (Table 1). The base
case behaved the same way, with more dosing of beads in the beginning
and reduced in the end (Figure 5). But as it has more frequent control
points, the agent skips adding beads at the onset to account for small
natural exhaustion, continuously adding beads for second to fifth step,
then performed the add-remove-skip step depending on simulated status,
with diminishing number of beads in subsequent steps. For cell type 2,
it adds beads for more steps at the outset (Figure 5) than before
(Figure 4b) and Cell types 3 and 4 differ as well. Cell 5 is simulated
with only regeneration increased from the base case and the agent
removes beads in the second half to let the activated cells grow without
getting exhausted. However, when the natural exhaustion is increased in
cell type 3, the agent falls into a dilemma: if it adds bead at the
beginning, the converted cells will be exhausted in the next steps, if
it adds bead at the end, it cannot take advantage of the higher
regeneration rate. Balancing these constraints, the agent adds beads in
the first two steps and then removes them in the third and skips the
next 10 steps. It then adds or removes bead depending upon the present
situation. However, this is less favorable than other cell types and
ends with a lower number of potent cells in the end. Finally, for cell
type 6 we increased the rate of natural exhaustion and added asymmetric
regeneration. In this case the agent alternately adds and removes beads
for first third of the control steps, and then ramps number of beads
with variability based on current cell count ; again, the expected
outcome (average reward) for this unfortunate cell type is dependent on
chance and lower than others.