Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Enhancing Outlier Detection in Air Quality Index Data Using a Stacked Machine Learning Model
  • +1
  • Abdoul Aziz Diallo,
  • Lawrence Nderu,
  • Bonface Malenje,
  • Gideon Kikuvi
Abdoul Aziz Diallo
Pan African University Institute for Basic Sciences Technology and Innovation

Corresponding Author:[email protected]

Author Profile
Lawrence Nderu
Jomo Kenyatta University of Agriculture and Technology
Author Profile
Bonface Malenje
Jomo Kenyatta University of Agriculture and Technology
Author Profile
Gideon Kikuvi
Jomo Kenyatta University of Agriculture and Technology
Author Profile

Abstract

Air quality is an important part of environmental health, having serious consequences for human health and well-being. The Air Quality Index (AQI) is a frequently used metric for assessing air quality in various areas and at different times. However, AQI data, like many other types of environmental data, can contain outliers - data points that deviate significantly from other observations, indicating exceptionally good or poor air quality, a critical step in identifying and understanding extreme pollution episodes that can have serious environmental and public health consequences. These outliers can be caused by a variety of variables, including measurement mistakes, odd meteorological circumstances, and pollution occurrences. While outliers can occasionally give useful information about these unusual conditions, they can also skew studies and models if they are not adequately accounted for. This paper describes a hybrid method for detecting outliers in data, AQI data are used in this study. The model uses a stacked machine learning model that incorporates K-means clustering, Random Forest (RF), and Gradient Boosting Classifier (GBC). K-means is used for initial categorization, followed by RF model training, and ultimately, the RF output is used as input for the GBC to generate the final classification. The performance of this stacked machine learning model is examined and compared to single models using the Accuracy measure. The findings show that the suggested technique is efficient, with an accuracy of 0.99, showing its potential for effective outlier detection in data.
15 Aug 2023Submitted to Engineering Reports
22 Aug 2023Submission Checks Completed
22 Aug 2023Assigned to Editor
24 Aug 2023Review(s) Completed, Editorial Evaluation Pending
11 Sep 2023Reviewer(s) Assigned
30 Oct 2023Editorial Decision: Revise Major
23 Feb 2024Review(s) Completed, Editorial Evaluation Pending
03 Apr 2024Editorial Decision: Revise Major
07 Apr 20242nd Revision Received
16 Apr 2024Submission Checks Completed
16 Apr 2024Assigned to Editor
16 Apr 2024Review(s) Completed, Editorial Evaluation Pending
17 Apr 2024Reviewer(s) Assigned
08 May 20243rd Revision Received