DUCFNet: Dual U-shaped Cross-modal Fusion Network for Lung Infection Region Segmentation

Shangwang Liu; Mengjiao Zhao

doi:10.22541/au.172114270.06575107/v1

loading page

DUCFNet: Dual U-shaped Cross-modal Fusion Network for Lung Infection Region Segmentation

Shangwang Liu,
Mengjiao Zhao

Abstract

To promote further development of medical image segmentation, there is an increasing demand for high-quality datasets. Regrettably, there are two major obstacles which are the difficulty of acquiring available medical images and the financial burden of data annotation for constructing high-quality datasets. To overcome the difficulties, we leverage medical text data to compensate for the defects of existing image datasets. In this work, we propose a dual U-shaped network to sufficiently achieve the cross-modal feature fusion of image and text. Specifically, one of the U-shaped branches is based on convolution neural network, named U-CNN, which mainly extracts global features of images and generate the final prediction results. The other one is based on vision transformer blocks, named U-ViT, which is responsible for processing text information and merging the text features and image features from U-CNN. Additionally, we utilize Cross-Attention Channel Fusion module and Channel-wise Dual-branch Cross Fusion module to equip the skip connection of U-CNN. And the two modules are greatly beneficial for resolving the semantic gaps and enhancing further integration of cross-modal information. Experimental results on two lung infection image datasets with different modalities (X-Ray and CT) suggest our method achieves excellent performance compared to other alternative state-of-the-art methods.

Submitted to Expert Systems

Show details

Hide details

Submission Checks Completed

Assigned to Editor

Reviewer(s) Assigned

Abstract

Peer review status:UNDER REVIEW