Figure 6: (a) Learning curve for agent trained with and without noise and reward histogram for simulation conducted with agent trained on 0, 250k and 500k episodes (b) Agent trained with 20, 50 and 400 timesteps (c) Number or training episodes required to reach accuracy of 80%, 90% and 95% by agents pre-trained for 500k steps on cell 1 vs. agents trained on respective cell types from beginning. Y axis shows the number of training runs required in log base 10 scale.