Olivia Dizon-Paradis - 21DOCS Test Area

Abstract—Reinforcement learning (RL) has become more popular due to promising results in applications such as chat-bots, healthcare, and autonomous driving. However, one significant challenge in current RL research is the difficulty in understanding which RL algorithms, if any, are practical for a given use case. Few RL algorithms are rigorously tested, and hence understood, for their practical implications. Although there are a number of performance comparisons in literature, many use few environments and do not consider real-world limitations such as run-time and memory usage. Furthermore, many works do not make their code publicly accessible for others to use. This paper addresses this gap by presenting the most comprehensive performance comparison on the practicality of RL algorithms known to date. Specifically, this paper focuses on discrete, model-free deep RL algorithms for their practicality in real-world problems where efficient implementations are necessary. In total, fourteen RL algorithms were trained on twenty-three environments (468 environment instances), which collectively required 224 GB and 766 days CPU time to run all experiments, and 1.7 GB to store all models. Overall, the results indicate several shortcomings in RL algorithms’ exploration efficiency, memory/sample efficiency, and space/time complexity. Based on these shortcomings, numerous opportunities for future works were identified to improve the capabilities of modern algorithms. This paper’s findings will help researchers and practitioners improve and employ RL algorithms in time-sensitive and resource-constrained applications such as economics, cybersecurity, and Internet of Things (IoT). Impact Statement—Reinforcement learning (RL) technologies are commonly used in autonomous driving, chat-bot, and business analytic applications. They learn how to adapt to unforeseen situations, reducing the load on human drivers, support teams, and analysts. Although there are a variety of theoretical works in RL literature, very few algorithms are tested and evaluated to facilitate their use in real-life scenarios. The performance comparison introduced in this paper addresses these limitations. The performance analysis framework, re-implemented source code, and findings identified in this study could increase the adoption and speed development of RL technologies in more real-life applications. Moreover, the open challenges, recommendations, and practical implications identified in this paper could facilitate collaboration and development of new technologies among researchers and practitioners in industry and academia.