The Internet of Things (IoT) has revolutionized connectivity, linking billions of sensors and devices to the Internet and enabling complex applications. This connectivity generates vast amounts of data, highlighting the problem of efficiently processing these data to extract valuable insights. Traditionally, cloud computing-based Deep Learning (DL) has been employed for this purpose. However, a major challenge arises from the limitations of deploying these resource-intensive models on edge IoT devices, which necessitates the development of efficient DL models suitable for edge computing. There is a lack of comprehensive reviews that delve deeper into the deployment of DL models on edge devices for real-time IoT applications. To address this, our review focuses on the design, development, and deployment of DL models on edge IoT devices. We explore a range of approaches, from deploying pre-trained shallow models to designing dedicated frameworks for training and deploying DL models directly on Microcontroller Units (MCUs). We also examine the applicability of these energy and computationally efficient DL models for various IoT applications, including healthcare, smart homes, smart cities, and industrial IoT. A comprehensive analysis of methods for developing efficient DL models and their deployment in diverse applications is provided, alongside addressing technical limitations and future directions in this field. Ultimately, this review aims to serve as a foundational resource, enabling researchers and practitioners in the domain of IoT and AI to develop and deploy efficient DL models on edge devices, thereby broadening the scope and impact of IoT applications worldwide.