Discussion
Here we elucidate a possible RL-based platform which would help robotic
arms to precisely deploy or remove activator molecules at specific time
points during T-cell activation to ensure maximum number of activated
cells (i.e., peak therapeutic potential) prior to pushing them back to
the patient. As a steppingstone, cell growth parameters were directly
obtained and inferred from literature to simulate the spatial and
temporal stochasticity of CAR T-cell activation and expansion with
reasonable fidelity. These simulation parameters, like tuning knobs, can
be updated to reflect accurate metrics of cell growth in a plug-and-play
fashion for a given experiment. Before deploying this neural engine
(agent) for a specific experiment, we first pre-train it to understand
properties of the cell type at hand and its physiochemical attributes.
This should reduce the number of training runs required in the physical
environment. The platform also pinpoints key patient-specific humoral
parameters that should be measured and imputed to an in-silicomodel – thereby making this relevant for personalized, point-of-care
therapeutics. The observables (physical parameters or snapshot of a
lattice of cells) from the physical environment collected as sensor
output or imaging data, will be continuously fed as input to RL to
device the best dosing policy to maximize activated cells. Continued
research on accurate, non-invasive, real-time measurement techniques to
enumerate cell types during culture will provide faster training
performance. With an improved and realistic simulation and adequate data
on patient-specific cell parameters, one could adjust the simulation
parameters (Table 1) and perform pre-training of the control strategyin silico before coupling to the physical environment. The
simulation would consequently inform the type of sensors needed for the
physical environment. It would also show how much noise the agent can
accommodate before it fails to learn anything at all. With a large
amount of measurement noise, the agent will likely (a) disregard the
noisy observation parameters (e.g., cell number, cell type, and
potency), and (b) fix a redundant policy based only on simulation step
count.
Notwithstanding, one possible reason for the agent’s ineptitude to learn
solely from a discrete image input (Figure 3) owes to the lack of
connection with the preceding and succeeding time-points. Thus, it
becomes next to impossible to gauge whether a certain action (dosing)
indeed helped in maximizing the number of robust cells. To this end, we
hope that instead of just providing one disembodied frame, if we exposed
the model to short stacks of three to five consecutive frames, the
learning rate and gains might improve – but we leave this as an
exercise for the future.
Moving forward, this cell-activation routine guided by RL can be used as
a template for other model-free, stochastic biological applications.
Apart from CAR T-cell activation, this bears promise to unravel hitherto
unknown biological policies opted by nature such as – the underlying
optimization of cell differentiation and cell proliferation. With the
advent of modern generative algorithms and data driven approaches, we
can hypothesize that it will be possible to create digital twins of cell
culture environments. Additionally, this 2D simulation can be updated to
a 3D environment representing more realistic growth conditions in static
reactors (multilayer growth). Possible further experiments are enlisted
on Supplement 8. In addition, this work provides a basis to benchmark
Transformer , and DAL-e based implementations of the same, which are
finding increasing applicability in different domains of biology. A
library of such pre-trained models enacted by robotic arms for precision
dosing would be useful to match the range of cell types handled by
clinics of tomorrow. This will enable effective control of activation
and expansion and get more efficacious therapies to patients faster.