Microsoft Kinect camera can capture depth images of the subject during surveillance of Human Activity Recognition (HAR) and subsequently obtain the skeletal data. Several studies have attempted to use and analyse human actions based on skeletal data and other complex feature representation extraction methods. Most authors have proposed obtaining Spatio-temporal information as one of the extraction methods. Therefore, this study automatically extracts the Spatio-temporal information from the skeletal data by using an Imaging time series (ITS) method called Recurrence Plots (RP) to transform the skeleton joint coordinates into 2D images. The raw data are preprocessed and partitioned into three-channel matrices (R, G, B) before applying the principal component analysis (PCA). The generated RP images are used as input to Convolutional Neural Network (CNN) to distinguish between different activities. The proposed method uses the UTD-MHAD dataset for benchmarking and shows that our approach outperforms previous studies with a maximum accuracy of 92.6%.