We proposed RTP, a composite neural network model that captures knowledge from remote transportation pollution events (RTPEs) to improve the local PM2.5 prediction. To the best of our knowledge, this is the first deep learning work to include knowledge from remote pollutants for PM2.5 prediction. RTP consists of two neural network components: a pre-trained base model and STRI model. The base model captures knowledge from local factors that influence PM2.5 concentrations and STRI captures knowledge from RTPEs by learning spatial-temporal characteristics of Satellite base AOD data and weather features from remote areas. In addition, given the size of the STRI model, to facilitate training and improve results we divide the full STRI model into two components: STRI\_fe, which is used to extract spatial-temporal features from remote areas, and STRI\_p, which predicts local PM2.5 concentrations using both remote and local features. The prediction results from STRI\_p show that the prediction error is reduced when remote features are added to the model, demonstrating that the STRI model indeed captures knowledge from RTPEs. To characterize the occurrence of RTPEs in northern Taiwan, we also developed an algorithm to classify PM2.5 concentrations attributable to RTPEs. We use the STRI model for the prediction of two EPA stations located at the northern tip of Taiwan and apply the classification algorithm to the results. This yields improvements in accuracy when remote features are added to the model, which demonstrates the impact of RTPEs at the stations.