Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that adversely affects the healthy development of children. The current clinical diagnosis of ASD relies on demographics and descriptions of clinical symptoms and assessments based on rating scales, which are both time-consuming and subjective. Previous research reported atypical visual patterns exhibited by ASD children, implying that eye-tracking could be a potential tool to assist in distinguishing ASD from others. However, most studies only examined a singular dimension of visual trajectory, neglecting the possible use of comprehensive information on eye scan paths. Given this, the present study introduced the FF-ASDNET - a framework designed for the automated screening of ASD individuals leveraging eye-tracking technologies in conjunction with deep representation learning and multi-scale feature fusion. Eye movement trajectory was first objectively quantified from multi-scale perspectives, including Gaze Movements (GM), Spatial Attention Distribution (SAD), and Temporal Visual Information (TVI). The residual network (ResNet) and the rectangular residual convolutional temporal network (RecResTCNN) were then proposed to learn the features of the three representations. Finally, the features of the three scales were fused using a convolutional neural network. The proposed design achieved an AUC of 85.96% and an F1 score of 0.85 on an independent dataset, showing that fusing multi-scale representations of eye scan trajectories could significantly improve the model’s performance, rendering the development of an automated ASD screening plausible. In summary, FF-ASDNET is a promising tool for screening ASD individuals, with potential applications in clinical diagnosis and intervention.