Indoor localization within buildings is paramount due to its diverse applications, including the Internet of Things (IoT), healthcare, and personnel monitoring. To improve the performance of indoor localization, this study proposes an algorithm combining Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) used within a hybrid model that uses both cellular technology, i.e, base station (BS) and Wi-Fi access point (AP). Within a realistic environment, we evaluate the performance of our approach across four scenarios: (1) (2 APs + 2 BSs), (2) (1 AP + 2 BSs), (3) (2 APs), and (4) (2 BSs). Additionally, we consider both stationary and seated user positions. Our results demonstrate significant improvements over existing methods. Specifically, in the 2 APs + 2 BSs scenario, we achieve an average Euclidean distance error of 1.04 meters, with a maximum error of 2.51 meters, and the Root Mean Square Error (RMSE) of 1.15 meters. We also obtain the corresponding Cumulative Distribution Function (CDF) for the error which indicates that 90% of the time, the error is less than 1.69 meters. Our hybrid model achieves over 99% accuracy in classifying different building floors and over 88% accuracy in identifying user states (e.g., standing or sitting). We validate our approach using smartphone-based indoor localization in a two story residential building with robust construction materials and double-layered walls.