Xinlei Wang

and 1 more

Remote sensing visible-light small target detection is a challenging and common problem. Due to the persistent objective factors such as complex backgrounds, it is still difficult to balance the detection accuracy and the number of parameters for densely packed and partially occluded small targets. Existing small target detection methods have neglected the issue of synchronous growth in the number of parameters while improving accuracy. To address these issues, we propose the Higher Precision Small Object Detection Transformer (HPS-DETR), a model optimized for remote sensing imagery with improved precision and real-time capability with fewer parameters. HPS-DETR introduces four key innovations. First, the Faster Convolutional Gated Selection Unit (Faster-CGSU) enhances multi-scale feature extraction by integrating gating mechanism with partial convolutions, improving the aggregation of local and global features while reducing redundant parameters. Second, the Attention-based Intra-scale Feature Interaction using Cascade Group Attention (CGA-AIFI) module focuses on critical feature regions while suppressing irrelevant background noise to improve attention diversity. Third, the Cross-Scale Dynamic Feature Fusion Module (CDFF) enriches small object detection through dynamic scale fusion and precise feature encoding, mitigating feature loss during upsampling and downsampling. Lastly, the Inner-MPDIoU loss function enhances bounding-box regression by prioritizing hard samples, improving accuracy and convergence. Experimental results in SIMD, DOTA-v1.0, and NWPU VHR-10 datasets demonstrate mAP improvements of 2.4%, 1.7%, and 3.1%, respectively, with a 22.11% reduction in parameter count compared to baseline. The mAPs also improves by 2.5%, 2.2%, and 3.3%, respectively. In VisDrone2019 and VEDAI datasets, mAP increases by 2.1% and 3.8%, demonstrating superior performance in small-object detection and computational efficiency.