Electroencephalography (EEG) emotion recognition, an important task in Human-Computer Interaction (HCI), has made a great breakthrough with the help of deep learning algorithms. Although the application of attention mechanism on conventional models has improved its performance, most previous research rarely focused on multiplex EEG features jointly, lacking a compact model with unified attention modules. This study proposes Joint-Dimension-Aware Transformer (JDAT), a robust model based on squeezed Multi-head Self-Attention (MSA) mechanism for EEG emotion recognition. The adaptive squeezed MSA applied on multidimensional features enables JDAT to focus on diverse EEG information, including space, frequency, and time. Under the joint attention, JDAT is sensitive to the complicated brain activities, such as signal activation, phase-intensity couplings, and resonance. Moreover, its gradually compressed structure contains no recurrent or parallel modules, greatly reducing the memory and complexity, and accelerating the inference process. The proposed JDAT is evaluated on DEAP, DREAMER, and SEED datasets, and experimental results show that it outperforms state-of-the-art methods along with stronger flexibility.