Priyanka Das

and 3 more

Federated learning (FL) is a paradigm for training deep neural network (DNN) models on a selected set of IoT/edge devices owned by clients without sharing local datasets. Devices such as smartphones, notebooks, autonomous vehicles, and so on train a specified global model using local datasets in multiple rounds orchestrated by a central FL server and share model weights with the latter. The FL server updates the global model by aggregating these weights and transferring the updated model back to clients for future training rounds. One of the key aspects of FL is that the data ownership solely rests with clients, who generate them independently. Furthermore, the quality of data belonging to the clients may differ due to hardware heterogeneity, spatio-temporal disparity, client preferences, and characteristics. As a result, the data used for model training at different devices are non-independent and identically distributed (non-IID) and suffer from quantity, label, and feature skewnesses. Data distribution and quality heterogeneities introduce biases that prevent the global model from achieving the desired convergence. Therefore, there is a need to select clients with relatively more balanced and better-quality data to ensure performance convergence. This paper proposes IQ-FL, an Image Quality-based novel client selection mechanism for FL with heterogeneous data distribution. We formulate a metric called IId-index for the FL server to rank the clients considering image quality and data imbalances and propose a scheduling algorithm to engage them in model training, ensuring participation count is restricted within a finite upper bound. This metric preserves desirable data privacy and is based on the feature extraction information of the client's DNN model and other metadata information. We have implemented IQ-FL on an open-source FL evaluation platform called FedEval and carried out an extensive empirical study. We compared our approach with three baselines on three publicly available datasets on grayscale images. Experimental results demonstrate that IQ-FL selects clients with more balanced and better quality data, resulting in better accuracy than the chosen baselines, and the process maintains client participation under a permissible limit.