Modern systems (e.g., assistive technology and self-driving) can place significant demands on the user’s working memory (WM), which can adversely impact performance (i.e., elevated risk of errors) and increase the cognitive load (CL). Robust prediction of CL from EEG remains a challenge due to the small sample problem, noisy recordings, ineffective data representation, and lack of robust models. This paper presents a holistic approach to developing a reliable prediction of CL. We used EEG data recorded following a modified Stenberg WM task in which four levels of CL were defined based on the encoding of 2, 4, 6, and 8 English characters. First, we address the problem of noise and “small sample” by generating large low noise data using eigenspace-based bootstrap sampling and generative adversarial network (GAN). Second, we transform EEG recordings into spatial-spectral images to capture spatial information. Third, we built parameter-optimized CNN models to predict four levels of CL using single-frequency bands (i.e., θ, α, β) and stacked (i.e., all three bands) representations. In our quest to provide interpretable models, we applied Gradientweighted Class Activation Mapping (Grad-CAM) to our models to localize the brain regions responsible for the prediction of CL. Empirical analysis of models trained using θ, α, β, and stacked representation show accuracy of 90%, 89%, 91%, and 94%, respectively. Grad-CAM visualizations showed that the prefrontal, cerebellum, frontal, and parietal areas have the highest contribution to the prediction of CL.