The purpose of feeder-level energy disaggregation is to decouple the net load measured at the feeder-head into various components. This technology is vital for power system utilities since increased visibility of controllable loads enables the realization of demand-side management strategies. However, energy disaggregation at the feeder level is difficult to realize since the high-penetration of embedded generation masks the actual demand and different loads are highly aggregated. In this paper, the solar energy at the grid supply point is separated from the net load at first via either an unsupervised upscaling method or the supervised gradient boosting regression tree (GBRT) method. To deal with the uncertainty of the load components, the probabilistic energy disaggregation models based on multi-quantile recurrent neural network model (multi-quantile long short-term memory (MQ-LSTM) model and multi-quantile gated recurrent unit (MQ-GRU) model) are proposed to disaggregate the demand load into controlled loads (TCLs), non-thermostatically controlled loads (non-TCLs), and non-controllable loads. A variety of relevant information, including feeder measurements, meteorological measurements, calendar information, is adopted as the input features of the model. Instead of providing point prediction, the probabilistic model estimates the conditional quantiles and provides prediction intervals. A comprehensive case study is implemented to compare the proposed model with other state-of-the-art models (multi-quantile convolutional neural network (MQ-CNN), quantile gradient boosting regression tree (Q-GBRT), Quantile Light gradient boosting machine (Q-LGB)) from training time, reliability, sharpness, and overall performance aspects. The result shows that the MQ-LSTM can estimate reliable and sharp Prediction Intervals for target load components. And it shows the best performance among all algorithms with the shortest training time. Finally, a transfer learning algorithm is proposed to overcome the difficulty to obtain enough training data, and the model is pre-trained via synthetic data generated from a public database and then tested on the local dataset. The result confirms that the proposed energy disaggregation model is transferable and can be applied to other feeders easily.