Planet-scale photo geolocalization involves the intricate task of estimating the geographic location depicted in an image purely based on its visual features. While deep learning models, particularly convolutional neural networks (CNNs), have significantly advanced this field, understanding the reasoning behind their predictions remains challenging. In this paper, we present a novel method that enhances the explainability of CNN-based geolocalization models by combining the information of applying Gradient-weighted Class Activation Mapping (Grad-CAM) to several layers, rather than solely to the final layer as is typically done. This approach provides a more detailed understanding of how different image features contribute to the model’s decisions, offering deeper insights than the traditional approach.