The rapid addition of diverse Internet-of-Things (IoT) assets to enterprise and operational networks has created significant challenges for network operators in effectively understanding and managing their attack surfaces. Networks can only be protected when all connected assets are accurately mapped and their intended behaviors are well understood. Researchers have attempted to use machine learning algorithms to model the network behavior of IoT devices by passively analyzing traffic packets and flows. However, existing approaches often focus on a limited subset of the available network traffic information, resulting in models that lack flexibility and generalizability across environments with diverse network compositions, computing resources, and device types and behaviors. We advocate an approach that utilizes deep learning to comprehensively capture both major and subtle patterns in network traffic, with the ability to adapt to various contexts in the future. Our specific contributions are threefold. (1) We develop a configurable data structure that represents network traffic in a fixed-size matrix, including flow metadata, packet direction, timing and raw payloads, and sequences of packets and flows. By analyzing a large dataset of IoT packet traces, we generate our structured traffic data, which we will publicly release, and draw insights into behavioral patterns exhibited by IoT devices; (2) We develop a two-dimensional convolutional neural network (CNN) architecture capable of decoding intra-flow and interflow patterns specific to various device types without requiring specialized supervision regarding the application-layer protocols. We will experiment with three representative strategies for traffic inference and highlight their pros and cons based on four metrics: accuracy, coverage, computational workload, and need for traffic selection; and, (3) We develop a method that utilizes confidence scores alongside Shapley values to explain and verify model predictions. When applied to real traffic traces, we demonstrate how the explainability insights derived from our model can help refine the neural network, leading to higher-quality predictions.