Abstract
Hyperspectral images contain rich spatial and spectral information,
which provides a strong basis for distinguishing different land-cover
objects. Therefore, hyperspectral image classification has been a hot
research topic. With the advent of deep learning, convolutional neural
networks (CNNs) have become a popular method for hyperspectral image
classification. However, CNN has strong local feature extraction ability
but cannot deal with long-distance dependence well. Vision Transformer
(ViT) is a recent development that can address this limitation, but it
is not effective in extracting local features and has low computational
efficiency. To overcome these drawbacks, we propose a hybrid
classification network that combines the strengths of both CNN and ViT,
names Spatial-Spectral Former(SSF). The shallow layer employs 3D
convolution to extract local features and reduce data dimensions. The
deep layer employs a spectral-spatial transformer module for global
feature extraction and information enhancement in spectral and spatial
dimensions. Our proposed model achieves promising results on widely used
public HSI datasets compared to other deep learning methods, including
CNN, ViT, and hybrid models.