High-Dimensional Biomarker Panel Joint Distribution Simulations
We evaluated whether GANs could be used to generate the joint
distribution of multiple biomarkers by using the 14 diabetes-relevant
biomarkers.
Because the 14-dimensional joint distribution is not amenable to
visualization, we used three different multi-dimensional visualization
approaches, t-SNE (Figure 3A), UMAP (Figure 3B), and PCA (Figure 3C) to
generate 2-dimensional projections of the test and GAN-generated
distributions. The projected data for the GAN-generated distribution
(teal circles) was well dispersed in the test data distribution (salmon
circles) for all three approaches. This indicates that GANs are a
promising approach for generating high dimensional biomarker
distributions.
To further assess the performance of GANs, we visualized the univariate
and bivariate marginal distributions from the high dimensional joint
distribution (Figure 3D) using pairs panel plots, which summarize the
univariate density along the diagonal, the bivariate scatter plots in
the lower triangular region and the Spearman correlation coefficients in
the upper triangular region. The pairs panel plots for scaled
log-transformed levels of seven biomarkers: urine albumin, urine
creatinine, fasting glucose, insulin, body mass index, glycohemoglobin
and triglyceride are shown in Figure 3D. The univariate densities (see
diagonal in Figure 3D) for the GAN-generated data for all seven
biomarkers overlapped extensively with the test data density and the
individual density curves were difficult to distinguish. The bivariate
scatter plots also overlapped extensively, and the GAN-generated data
points were evenly dispersed among the test data points for all 21
bivariate plots in Figure 3D.
These results show that GAN-generated distributions can be useful for
modeling systems of clinical biomarkers.