Neural networks play an important role in satellite image classification. We know that the most common neural networks used in image classification tasks are convolutional neural networks (CNNs). In this paper, we explored the influence of the spectral bands in image classification using the Vision Transformer (ViT). Convolution is a local operation, and a convolution layer typically models only the relationships between neighborhood pixels. The transformer is a global operation, and a transformer layer can model the relationships between all pixels. This motivated us to use ViT for satellite image classification. Sentinel-2 EuroSAT image dataset, which consists of 27,000 images in ten classes, is used for the experiment. ViT model is trained with the three-band dataset, Red-Green-Blue (RGB), and compared with ViT model trained with RGB along with Near InfraRed (NIR) and with a multispectral satellite image dataset (13 bands). Experimental results show that the NIR band combined with RGB was able to produce more accurate results than RGB alone, whereas the 13 bands datasets outperformed both RGB and RGB & NIR datasets.