In recent years, Deep Neural Networks (DNNs) have been widely used for Human Gesture Recognition (HGR) based on the information obtained from inertial sensors, such as accelerometers and gyroscopes, available on smart Internet of Things (IoT) devices. Most of the recent works on HGR using motion data rely on gathering a dataset, that faces two major challenges: a) the datasets are originally stored on the smart devices at the end-users, and gathering them in one place is not feasible due to communication limitations, and b) clients are reluctant to share their private data with a central server due to privacy concerns. In this paper, we address these issues and propose a privacy-preserving framework based on Federated Learning (FL) for HGR using motion data, called Motion-based Federated Learning Gesture Recognition (MoFLeuR). Furthermore, we consider different types of data heterogeneity which have destructive effects on the performance of the global model. Accordingly, we propose a communication and computation-efficient client selection method that chooses the clients to mitigate the impact of data heterogeneity in the training process. In the proposed framework, clients are not requested to share sensitive information about their local datasets with the edge server in the FL process. Simulation results show that the proposed MoFLeuR algorithm improves the performance of the global model in the presence of different degrees of data heterogeneity, and it outperforms the baseline algorithms in terms of different metrics, namely accuracy, convergence speed, and communication and computation efficiency.