Discussion
Here we elucidate a possible RL-based platform which would help robotic arms to precisely deploy or remove activator molecules at specific time points during T-cell activation to ensure maximum number of activated cells (i.e., peak therapeutic potential) prior to pushing them back to the patient. As a steppingstone, cell growth parameters were directly obtained and inferred from literature to simulate the spatial and temporal stochasticity of CAR T-cell activation and expansion with reasonable fidelity. These simulation parameters, like tuning knobs, can be updated to reflect accurate metrics of cell growth in a plug-and-play fashion for a given experiment. Before deploying this neural engine (agent) for a specific experiment, we first pre-train it to understand properties of the cell type at hand and its physiochemical attributes. This should reduce the number of training runs required in the physical environment. The platform also pinpoints key patient-specific humoral parameters that should be measured and imputed to an in-silicomodel – thereby making this relevant for personalized, point-of-care therapeutics. The observables (physical parameters or snapshot of a lattice of cells) from the physical environment collected as sensor output or imaging data, will be continuously fed as input to RL to device the best dosing policy to maximize activated cells. Continued research on accurate, non-invasive, real-time measurement techniques to enumerate cell types during culture will provide faster training performance. With an improved and realistic simulation and adequate data on patient-specific cell parameters, one could adjust the simulation parameters (Table 1) and perform pre-training of the control strategyin silico before coupling to the physical environment. The simulation would consequently inform the type of sensors needed for the physical environment. It would also show how much noise the agent can accommodate before it fails to learn anything at all. With a large amount of measurement noise, the agent will likely (a) disregard the noisy observation parameters (e.g., cell number, cell type, and potency), and (b) fix a redundant policy based only on simulation step count.
Notwithstanding, one possible reason for the agent’s ineptitude to learn solely from a discrete image input (Figure 3) owes to the lack of connection with the preceding and succeeding time-points. Thus, it becomes next to impossible to gauge whether a certain action (dosing) indeed helped in maximizing the number of robust cells. To this end, we hope that instead of just providing one disembodied frame, if we exposed the model to short stacks of three to five consecutive frames, the learning rate and gains might improve – but we leave this as an exercise for the future.
Moving forward, this cell-activation routine guided by RL can be used as a template for other model-free, stochastic biological applications. Apart from CAR T-cell activation, this bears promise to unravel hitherto unknown biological policies opted by nature such as – the underlying optimization of cell differentiation and cell proliferation. With the advent of modern generative algorithms and data driven approaches, we can hypothesize that it will be possible to create digital twins of cell culture environments. Additionally, this 2D simulation can be updated to a 3D environment representing more realistic growth conditions in static reactors (multilayer growth). Possible further experiments are enlisted on Supplement 8. In addition, this work provides a basis to benchmark Transformer , and DAL-e based implementations of the same, which are finding increasing applicability in different domains of biology. A library of such pre-trained models enacted by robotic arms for precision dosing would be useful to match the range of cell types handled by clinics of tomorrow. This will enable effective control of activation and expansion and get more efficacious therapies to patients faster.