Abstract
Reinforcement learning (RL), a
subset of machine learning (ML), can potentially optimize and control
biomanufacturing processes, such as improved production of therapeutic
cells. Here, the process of CAR-T cell activation by antigen presenting
beads and their subsequent expansion is formulated in-silico . The
simulation is used as an environment to train RL-agents to dynamically
control the number of beads in culture with the objective of maximizing
the population of robust effector cells at the end of the culture. We
make periodic decisions of incremental bead addition or complete
removal. The simulation is designed to operate in OpenAI Gym which
enables testing of different environments, cell types, agent algorithms
and state-inputs to the RL-agent. Agent training is demonstrated with
three different algorithms (PPO, A2C and DQN) each sampling three
different state input types (tabular, image, mixed); PPO-tabular
performs best for this simulation environment. Using this approach,
training of the RL-agent on different cell types is demonstrated,
resulting in unique control strategies for each type. Sensitivity to
input-noise (sensor performance), number of control step interventions,
and advantage of pre-trained agents are also evaluated. Therefore, we
present a general computational framework to maximize the population of
robust effector cells in CAR-T cell therapy production.