Land cover classification has the goal to attribute each pixel of high-resolution remote sensing image with planimetric category labels (such as vegetation, building, water etc.). In recent years, many serial learning architectures (features are delivered through a single path such as in ResNet, MobileNet, Segformer etc.) based on Convolutional Neural Networks (CNN) and attention mechanisms have been widely explored in land cover classification. However, high-resolution remote sensing images typically have abundant textual details, variable scales in objects, large intra-class differences and similar inter-class distances, which brings challenges to land cover classification. In this work, we presented two pluggable modules to further boost serial learning architecture: first, to cope with ambiguous boundaries caused by lost details and fragmented segmentation stemmed from scale variances, a combination of channel attention and spatial attention is proposed to reconstruct multi-scale feature; Second, to mitigate the classification error caused by intra-class variance and inter-class correlation, we build feature vectors for each category, and apply a multi-head attention model to capture self-attention dependence among different categories. The experimental results demonstrate that the proposed modules are feasible to existing serial learning architectures and can improve OA by 5.64% on the ISPRS Vaihingen 2D dataset (using ResNet50 as backbone). In addition, compared to other state-of-art models, our method can achieve similar or even better classification results, yet offer superior inference performance