To support the explosive growth of the Internet of Things (IoT), Uplink (UL) grant-free Non-Orthogonal Multiple Access (NOMA) emerges as a promising technology. It has the potential of offering scalable and low-cost solutions for the resource-constrained Massive Machine Type Communication (mMTC) systems. In principle, the grant-free NOMA enables small signaling overhead and low access latency time by circumventing complicated grant-access based procedures which is commonly found in the legacy wireless networks. In a UL grant-free system, a complete Multi-User Detection (MUD) algorithm not only performs the Active User Detection (AUD) but also the Channel Estimation (CE) and the Data Detection (DD). By exploiting the naturally occurring sparse user activity in the mMTC systems, the MUD problem can be solved using a wide range of Compressive Sensing based algorithms (CS-MUD). However, some alternative routes have been explored in the literature as well. The utility of these algorithms, in general, revolve around some assumptions about the channel or the availability of perfect channel information at the Base Station (BS). How these assumptions are met in a practical circumstance is, however, an important concern. In this work we devise an end-to-end MUD using Deep Neural Network (DNN) where we relax these assumptions. We approximate an ensemble of trained DNN based MUD using Knowledge Distillation (KD) to enable fast AUD at the Base Station (BS). Furthermore, using the inter-resource correlation, we estimate the channels of the active users which is an ill-posed problem otherwise. We carry out elaborate numerical investigation to validate the efficacy of the proposed approach for the UL grant-free NOMA systems.