Wearable EEG applications demand an optimal trade-off between performance and system power consumption. However, high-performing models usually require many features for training and inference, leading to a high computational and memory budget. In this paper, we present a novel knowledge distillation methodology to reduce the number of EEG channels (and therefore, the associated features) without compromising on performance. We aim to distill information from a model trained using all channels (teacher) to a model using a reduced set of channels (student). To this end, we first pre-train the state-of-the-art model on features extracted from all channels. Then, we train a naive model on features extracted from a few task-specific channels using the soft labels predicted by the teacher model. As a result, the student model with a reduced set of features learns to mimic the teacher via soft labels. We evaluate this methodology on two publicly available datasets: CHB-MIT for epileptic seizure detection and BCI competition IV-2a dataset for motor-imagery classification. Results show that the proposed channel reduction methodology improves the precision of the seizure detection task by about 8% and the motor-imagery classification accuracy by about 3.6%. Given these consistent results, we conclude that the proposed framework facilitates future lightweight wearable EEG systems without any degradation in performance.