This work focuses on machine learning modeling and predictive control of nonlinear processes using noisy data. We use long short-term memory (LSTM) networks with training data corrupted by two types of noise: Gaussian and non-Gaussian noise, to train the process model that will be used in a model predictive controller (MPC). We first discuss the LSTM training with noisy data following a Gaussian distribution, and demonstrate that the standard LSTM network is capable of capturing the underlying process dynamic behavior. Subsequently, given that the standard LSTM performs poorly on a noisy dataset from industrial operation (i.e., non-Gaussian noisy data), we propose an LSTM network using Monte Carlo dropout method to reduce the over-fitting to noisy data. Furthermore, an LSTM network using co-teaching training method is proposed to further improve its approximation performance when clean data from a nonlinear model capturing the nominal process state evolution is available.