Figure 6: (a) Learning
curve for agent trained with and without noise and reward histogram for
simulation conducted with agent trained on 0, 250k and 500k episodes (b)
Agent trained with 20, 50 and 400 timesteps (c)
Number or training episodes
required to reach accuracy of 80%, 90% and 95% by agents pre-trained
for 500k steps on cell 1 vs. agents trained on respective cell types
from beginning. Y axis shows the number of training runs required in log
base 10 scale.