Reinforcement learning is a promising approach that can allow machines to acquire knowledge and solve problems without the intervention of humans. However, the current implementation of reinforcement learning algorithms on standard complementary metal-oxide-semiconductor based platform constraints the performance due to von Neumann architecture, which leads to increased energy consumption and latency. To this end, in this work, we propose an extremely area- and energy-efficient implementation of Monte Carlo learning on passive resistive random access memory (RRAM) crossbar array considering the non-ideal hardware artifacts such as device-to-device variation, noise and endurance failure. To illustrate the capabilities of our implementation, we considered the classical control problem of cart-pole. Our results indicate that the proposed passive RRAM crossbar-based implementation of Monte Carlo learning not only outperforms prior digital and active 1 Transistor - 1 RRAM (1T1R) crossbar-based implementation by more than five orders of magnitude in terms of area but is also robust against spatial and temporal variations and endurance failure of RRAM devices.