Multimodal Image Fusion for Object Detection via Dynamic Channel
Adjustment and Multi-Scale Activated Attention Mechanism Network
Abstract
Multimodal image fusion has become crucial for object detection, offer-
ing enhanced feature representations by integrating information from
diverse image modalities, such as RGB and thermal images. Recent
advances in neural networks, including convolutional neural networks
(CNNs) and Transformer-based approaches, have achieved substantial
progress in this area. However, existing methods often struggle to fully
integrate complementary information across modalities, particularly in
enabling activated fusion over varying regions and scales. To address
these limitations, we propose the Dynamic Channel Adjustment and
Multi-Scale Activated Attention Mechanism Network (DAMAN). This model
improves inter-modal feature integration and strengthens spa- tial and
contextual information capture. Extensive experiments demon- strate
DAMAN’s superior adaptability to objects of varying sizes and its
robustness in complex traffic and industrial environments. Code and
model checkpoints will be released following peer review.