Manual visual inspection, typically performed by certified inspectors, is still the main form of road pothole detection. This process is, however, not only tedious, time-consuming and costly, but also dangerous for the inspectors. Furthermore, the road pothole detection results are always subjective, because they depend entirely on the inspector’s experience. In this paper, we first introduce a disparity (or inverse depth) image processing module, named quasi inverse perspective transformation (QIPT), which can make the damaged road areas become highly distinguishable. Then, we propose a novel attention aggregation (AA) framework, which can improve the semantic segmentation networks for better road pothole detection, by taking the advantages of different types of attention modules. Moreover, we develop a novel training set augmentation technique based on adversarial domain adaptation, where synthetic road RGB images and transformed road disparity (or inverse depth) images are generated to enhance the training of semantic segmentation networks. The experimental results illustrate that, firstly, the disparity (or inverse depth) images transformed by our QIPT module become more informative; secondly, the adversarial domain adaptation can not only significantly improve the performance of the state-of-the-art semantic segmentation networks, but also accelerate their convergence. In addition, AA-UNet and AA-RTFNet, our best performing implementations, respectively outperform all other state-of-the-art single-modal and data-fusion networks for road pothole detection.