Evaluating input strategies and algorithms
At each step of an RL episode, the agent algorithm chooses the action by
taking an observation snapshot as input. There are many possible
observation data formats that can be sent to the agent. For example,
bulk measurements could be made by impedimetric (Agilent, Xcelligence)
or permittivity-based sensors (Skroot Lab Inc). Real time imaging
systems (Sartorius Incucyte) coupled to Artificial Intelligence (AI)
empowered cell classification tools can specify and quantify cell types
based on morphology . Those tools can be used to count naïve and
activated cells along with other cell properties such as age and
robustness. Other data such as time elapsed, quantity of beads in the
system and action history can be obtained from the system itself. All
the data can be input in the form of a list of measured values to the
agent. This method is termed the tabular method in this work (Figure 3).
Another possible observation format can be in the form of image,
obtained from high precision microscopy. In this work we also try to
observe if a three-channel image of the simulation environment like
Figure 2c alone is enough to provide the agent with enough information
to adequately train (Figure 3). The third input format tested is the
fusion between the above two, where both tabular and image information
are provided to the agent (Figure 3). Here we refer to each agent in
‘algorithm-input’ format, for example PPO-image refers agent trained
with PPO algorithm on image data. The aim of this analysis is to
demonstrate how agent training depends on algorithms and input schemes.
The reward the agent gets at the end of each episode is the episodic
reward and at each point we plot another point averaging all previous
episodic rewards (average reward shown in red in Figure 3). The rising
trend of the average reward in the beginning indicates the agent is
learning and constantly obtaining a better strategy whereas flattening
of the average reward indicates that the agent has settled for an
optimized strategy (see PPO-tabular and DQN-tabular input in Figure 3).