Unified multi-stage fusion network for affective video content analysis