Counting the number of people in a room is crucial for efficiently running smart buildings. It helps optimize space use, save energy, enhance security, and ensure occupant comfort. By knowing occupancy levels, businesses can better manage resources and reduce costs and environmental impact. Radar-based occupancy estimation is gaining attention in the scientific community due to its unobtrusive form of measurement which also avoids the privacy concerns associated with video imaging-based sensors. Prior research mostly focused on testing the feasibility of finding the correlation between the time-frequency mapped radar-reflected echoes with the number of people in a room. This paper proposes the utilization of a 24-GHz CW radar, leveraged with time-frequency mapping techniques using Continuous Wavelet Transform (CWT) and power spectrum, to estimate human occupants. We utilized the time-frequency mapped scalogram images to train deep-learning models named DarkNet19, MobileNetV2, and ResNet18. Repeated measurements were carried out for about 4 hours and 40 minutes on different days, capturing data from varying numbers of occupant groups with sedentary positions (ranging from 1 to 7 occupants). The collected data was segmented using a 10-second window, resulting in a total of 1680 images of radar-reflected echoes of different occupants. Experimental results demonstrated that ‘DarkNet19’ superseded the other networks, achieving an accuracy of 92.7% on the CWT dataset and 92.3% on the power spectrum dataset. These findings suggest that Doppler radar time-frequency mapped images of reflected echoes with deep learning integration can be considered an effective solution for occupant counting in smart building applications.