Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time and required a-priori system knowledge and models in existing methods. To address these limitations, we propose a novel exploration method that integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step-size of the agent’s actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent’s actions. We demonstrate the potential of our exploration method in simulation for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality and time-efficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively.