Abstract
Background Microbiome studies are often limited by a lack of statistical
power due to small sample sizes and a large number of features. This
problem is exacerbated in correlative studies of multi-omic datasets.
Statistical power can be increased by finding and summarizing modules of
correlated observations, which is one dimensionality reduction method.
Additionally, modules provide biological insight as correlated groups of
microbes can have relationships among themselves. Results To address
these challenges, we developed SCNIC: Sparse Cooccurrence Network
Investigation for compositional data. SCNIC is open-source software that
can generate correlation networks and detect and summarize modules of
highly correlated features. Modules can be formed using either the
Louvain Modularity Maximization (LMM) algorithm or a Shared Minimum
Distance algorithm (SMD) that we newly describe here and relate to LMM
using simulated data. We applied SCNIC to two published datasets and we
achieved increased statistical power and identified microbes that not
only differed across groups, but also correlated strongly with each
other, suggesting shared environmental drivers or cooperative
relationships among them. Conclusions SCNIC provides an easy way to
generate correlation networks, identify modules of correlated features
and summarize them for downstream statistical analysis. Although SCNIC
was designed considering properties of microbiome data, such as
compositionality and sparsity, it can be applied to a variety of data
types including metabolomics data and used to integrate multiple data
types. SCNIC allows for the identification of functional microbial
relationships at scale while increasing statistical power through
feature reduction.