CIME: Contextual Interaction-based Multimodal Emotion Analysis with
Enhanced Semantic Information
Abstract
In the rapidly expanding domain of multimodal data, the field of emotion
analysis has advanced through the sophisticated integration of diverse
informational modalities. This study introduces the CIME model:
Contextual Interaction-Based Multimodal
Emotion Analysis with Enhanced Semantic Information. This
innovative spatiotemporal interaction network model utilizes enhanced
semantic information to elevate the accuracy and robustness of emotion
analysis across both semantic and contextual dimensions. The model
incorporates attention mechanisms and graph convolutional networks to
enrich textual semantic comprehension through a cross-attention-based
semantic interaction module and to delineate the contextual
relationships among speakers via a graph convolution-based spatial
interaction module. These enhancements enable the model to effectively
mine the latent associations within multimodal emotional data. Through
extensive evaluations on public datasets such as IEMOCAP and MOSEI, the
proposed CIME model demonstrates superior performance in multimodal
emotion classification tasks compared to existing methods. Further,
modality ablation studies and comparative analysis of various fusion
strategies affirm the model’s effectiveness and adaptability, providing
new insights and methodologies for advancing the field of multimodal
emotion analysis. Code supporting this study is available at
https://github.com/gcp666/CIME.