Abu Asaduzzaman

and 2 more

Processing large amount of data with many input features is always time consuming and expensive. In machine learning (ML), the number of input features play a crucial role in determining the performance of the ML models. Studies show that ML has potential for dimensionality reduction. This work proposes a methodology to reduce the number of input features using ML to facilitate cost-effective data analysis. Two different data sets for water quality prediction from Kaggle are used to run the ML models. First, we use Recursive Feature Elimination with Cross-Validation (RFECV), Permutation Importance (PI), and Random Forest (RF) models to find the impact of input features on predicting water quality. Second, we conduct experiments applying seven ML models: RF, Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM), and Deep Neural Network (DNN) to explore water quality using the original and reduced datasets. Third, we evaluate the impact of the optimized data features on computations and cost to test water quality. Experimental results show that reducing the number of features from nine to five for Dataset 1 helps reduce computations by up to 59% and cost up to 65%. Similarly, reducing the number of features from 20 to 16 for Dataset 2 helps reduce computations by up to 20% and cost up to 14%. This study may help mitigate the curse of dimensionality, via improving the performance of ML models by enhancing data generalization.

Md Raihan Uddin

and 2 more

The innovations of wireless and machine-to-machine (M2M) technologies have led to the proliferation of many Internets of Things (IoT) devices. In a conventional cloud based IoT system, vast amounts of data generated from devices are processed, analyzed, and stored in a central Cloud Server (CS). However, the increasing number of devices, and the accompanying and growing vast device data strain the CS, leading to scalability issues. This results in performance degradation i.e., longer execution time, high energy consumption and low throughput. Studies show that Collaborative Edge-Cloud Computing (CECC) has the potential to enhance system scalability and performance. In this work, we study and contribute to CECC research by proposing a method to enhance scalability and performance. First, the CSs are made almost fully utilized with the device data. Then, computations, and precisely the device data are distributed among the Edge Servers (ESs) and CSs, and performance is assessed to obtain the optimal pairing of computations. Finally, additional devices are added, and data are allocated to the CS to assess scalability, and performance. An IoT system with 30 devices of five different types distributed to 10 edges, two ESs and one CS, is modeled and simulated using VisualSim. Experimental results show that the proposed method enhances the system ability to process additional device data generated from either 8 of type-1, 2 and 3 devices, or 16 of type-4 device, or 32 of type-5 device. It also helps to reduce execution time and energy consumption by 74% and 17% respectively. This method has also the potential to benefit different scheduling algorithms, machine learning, and federated learning technologies.