Results
Design of CAR T-cell Activation Simulation
The objective of the simulation is to maximize the number of activated CAR T-cells through dynamic control of bead addition and removal. At each time interval, simulated data of culture condition in the form of tabulated sensor measurements and/or microscopic images will be provided to the agent. The agent can either add more beads, take away all beads or refrain from acting at that step (Figure 1b). After sufficient training in a similar environment, the agent is expected to choose an action based on the input data which will optimize the end goal. Before attempting such a control strategy on a physical environment, the process of bead-based CAR T-cell activation is simulated as an RL environment (Figure 1a). A 2D surface (Figure 2c) for cell growth is simulated as a continuous \(n\times\) \(n\) grid with spacing of 10 microns to match the approximate cell diameter . In all the simulation 50×50 corresponding to 500 by 500 sq-micron area is used. For better clarity in observing the cells, (in Figure 2c) a 20 × 20 grid is used for demonstration purposes. The simulated expansion area is made continuous (no boundary). All defined parameters for this simulation are described in Table 2. Although attempts were made to associate these parameters with literature values, some assumptions were made. It is important to note that the modular simulation and RL training presented here can be readily updated as more measured values are determined. A fixed time method is used with a value of 6 min per step, derived from the approximate time a cell translates one diameter away or to the next grid spacing (velocity of the cell is ~2 micron per minute ). There are other factors affecting cellular migration like media viscosity, age of cell, size of cell, etc. that are neglected in this simplified model. The total simulation lasts for a 7-day expansion campaign, equivalent to 1600 simulation steps. Bead to cell contact, bead to cell ratio and confluence are taken into consideration in the simulation rules considering their role in the efficiency of activation .
At simulation start the grid is randomly seeded (Figure 2c) with a specified number of naïve T-cells indicated as red cells in the simulation. The following steps are iterated for each cell in the simulation: Step 1. It can propagate to any of the 8 adjacent cells if it satisfies movement conditions, namely vacancy at the chosen grid and probability of making a move at that step determined stochastically (Figure 2a and Supplement 2). Step 2: If a naïve cell occupies a position where an activation bead (coupled to anti-CD3 and anti-CD28 antibodies) is present and if certain conditions (probability of conversion at that step beyond a threshold determined stochastically, detailed in Supplement 2) are met, the naïve cell is activated and turns blue in the simulation (Figure 2a). Step 3:If an already activated cell gets in a position where there is a bead, it gets exhausted depending on the value of the specified exhaustion rate (Figure 2a). Step 4: At each time step each activated cell gets exhausted as per natural, transient exhaustion rate which is (\(\ \frac{\text{natural\ exhaustion\ }}{\text{total\ timesteps}}\ \)) times smaller compared to accelerated exhaustion caused by over exposure and stimulation caused by beads (see Table 1 Units and Figure 2b).