Ci Lin

and 4 more

Traditionally, time series data augmentation has primarily focused on improving the architecture of Generative Adversarial Network (GAN), with the aim of closely matching the original data distribution while also preserving the dynamic behavior of the original data. However, even state-of-the-art GAN models like TimeGAN fall short in preserving the temporal dynamics present in the original time series due to the absence of first-order difference information. To address this limitation, this study proposes a novel process for generating multivariate time series data. The proposed process comprises four essential modules: a) the GAN module for generating multivariate time series data, b) the sampling module for preserving the first-order difference distribution, c) the smoothing module for refining the generated data, and d) an evaluation module using the Kolmogorov-Smirnov Test (KS-test) and Hilbert-Schmidt Independence Criterion (HSIC), along with other metrics to test the synthetic time series data. This comprehensive approach ensures that the synthetic time series data maintains both the distribution and the dynamic behavior of the original data. We extensively discuss the role of the β factor in the modified Metropolis-Hastings algorithm (in the sampling module), which controls the level of information preservation from the original time series. Our experiments reveal that with small β values, periodic information can be retained effectively. The joint distribution of the first-order difference of the synthetic time series data remains consistent when the same β value is applied in the modified Metropolis-Hastings algorithm. However, we observe that β has no impact on the partial autocorrelation functions. Nevertheless, the generated data from the sampling module maintains the memoryless property of the Markov Chain. Therefore, in the smoothing module, we apply the exponential moving average (EMA) method to simulate the long-term relationships within the original time series, and find that an optimal α value is approximately 0.4 or 0.5. Lastly, we employ the synthetic time series data to train a neural network model developed in another work. Our findings indicate that the neural network model trained on synthetic time series data exhibits performance comparable to that of a model trained on the original data.

Ci Lin

and 4 more