An Online Hyper-volume Action Bounding Approach for Accelerating the
Process of Deep Reinforcement Learning from Multiple Controllers
Abstract
This paper fuses ideas from Reinforcement Learning (RL),
Learning from Demonstration (LfD), and Ensemble Learning into a single
paradigm. Knowledge from a mixture of control algorithms (experts) are
used to constrain the action space of the agent, enabling faster RL
refining of a control policy, by avoiding unnecessary explorative
actions. Domain-specific knowledge of each expert is exploited. However,
the resulting policy is robust against errors of individual experts,
since it is refined by a RL reward function without copying any
particular demonstration. Our method has the potential to supplement
existing RLfD methods when multiple algorithmic approaches are available
to function as experts. We illustrate our method in the context of a
Visual Servoing (VS) task, in which a 7-dof robot arm is controlled to
maintain a desired pose relative to a target object. We explore four
methods for bounding the actions of the RL agent during training. These
methods include using a hypercube and convex hull with modified loss
functions, ignoring actions outside the convex hull, and projecting
actions onto the convex hull. We compare the training progress of each
method with and without using the expert demonstrators. Our experiments
show that using the convex hull with a modified loss function
significantly improves training progress. Furthermore, we demonstrate
faster VS error convergence while maintaining higher manipulability of
the arm, compared to classical image-based VS, position-based VS, and
hybrid-decoupled VS.