Video Human Action Recognition by the Fusion of Significant Motion and Coarse-fine Temporal Granularity Features

Hanhua Cao

doi:10.22541/au.171402583.36303604/v1

loading page

Video Human Action Recognition by the Fusion of Significant Motion and Coarse-fine Temporal Granularity Features

Hanhua Cao

Abstract

Video human action recognition is an important academic issue, and due to the challenges involved in the problem, such as complex scenes, large changes in spatial scale, and irregular deformation of the identified targets, the algorithm generally has some shortcomings such as large parameter quantities and high computational costs. This paper proposes a new computing framework based on lightweight deep learning models, using time-domain Fourier transform to generate motion salience maps to highlight human motion areas in feature extraction in which video intrasegment and inter-segment Feature differences at different time scales are used to obtain action information at different time granularity, to conduct more accurate action model of the human action with varying spatio-temporal scales. Additionally, an action excitation method based on the deformable convolution is proposed to solve problems of irregular deformation, spatial multi-scale changes, and the loss of underlying information as the network depth increases. Data experiments are proposed to verify the effectiveness of the proposed algorithm in terms of computational efficiency and accuracy.