So far, various data-driven approaches have been presented to obtain channel state information (CSI) in mmWave multiple-input-multiple-output (MIMO) wireless networks. In almost all previous works, training and testing channels were assumed to have the same distribution, which may not be the case in practice. In this paper, we address this challenge, by proposing a learning framework that is a combination of a long short-term memory (LSTM) network and a deep neural network (DNN) for estimating CSI in a dynamic wireless communication environment. Furthermore, we use federated learning (FL) to train the learning-based channel estimation (CE) model. More specifically, we introduce a two-stage downlink pilot transmission procedure, where in the initial stage, long frame length downlink pilot signals are used to train the introduced RNN-DNN model. Following that, users will receive shorter-frame-length pilot signals that can be used for CSI estimation. To speed up the training procedure of the proposed network, we first generate a pre-trained model and then modify it according to the collected data samples. Simulation results demonstrate that, when the channel distribution is unavailable, the proposed approach performs significantly better than the most recent channel estimation algorithms in terms of estimation performance and computational complexity.