Ruijin LIU - 21DOCS Test Area

[1](#fn-0002) Abstract—Real-time semantic segmentation is one of the most researched areas in the field of computer vision, and research on dual-branch networks has gradually become a popular direction in network architecture research. In this paper, we propose a dual-branch automatic driving image segmentation network integrating spatial and channel attention mechanisms, named ”BiAttentionNet”. The network aims to balance network accuracy and real-time performance by separately processing semantic information in high-level features and detail information in low-level features. BiAttentionNet consists of three main parts: the detail branch, the semantic branch, and the proposed attention-guided fusion layer. The detail branch extracts local and surrounding context features using the designed PCSD convolution module to process wide-channel low-level feature information. The semantic branch utilizes an improved lightweight Unet network to extract semantic information from deep narrow channels. Finally, the proposed attention-guided fusion layer fuses the features of the dual branches using detail attention and channel attention mechanisms to achieve image segmentation tasks in road scenes. Comparative experiments with recent mainstream networks such as BiseNet v2, Fast-SCNN, ConvNeXt, SegNeXt, Segformer, CGNet, etc., on the Cityscapes dataset show that BiAttentionNet achieves a highest accuracy of 65.89% in terms of mIoU metric for the backbone network. This validates the advanced and effective nature of the proposed BiAttentionNet.