Prediction intervals (PIs) offer an effective tool for quantifying uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to the unforeseen change in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles of PIs’ bounds. It relies on the online learning ability of reinforcement learning (RL) to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve PIs’ quality. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay (PER) strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.