STRING and KEGG analyses
Protein-protein interaction networks were analyzed using STRING ver.
11.0 (Szklarczyk et al., 2019) (https://string-db.org/). A list of
protein accession numbers consisting of all significantly regulated
proteins and mRNA, as well as those found in the up- and down-regulated
protein clusters based on Genesis output was entered into the STRING
search function. Search results created lists of significantly enriched
(FDR<0.05) Uniprot keywords and protein domains classified by
Pfam, InterPro, and SMART databases. Further, protein networks with more
than one edge and containing at least one significantly regulated
protein were identified and searched as a subset of the total group.
From this result, in addition to the enriched keywords and domains,
sub-networks where identified. An MCL inflation factor of 1.3 was chosen
to create seven distinct networks which encompassed all of the
significantly regulated proteins connected to at least one other
protein. Each individual network protein list was then entered as a
STRING search to determine enriched keywords and proteins and classify
networks by cellular function.
Over-representation of proteins in known molecular pathways was analyzed
using KEGG Mapper search
(https://www.genome.jp/kegg/tool/map_pathway1.html). First, the
GhostKOALA automatic annotation server (Kanehisa, Sato, & Morishima,
2016) was used to determine matching KEGG numbers to all of the proteins
in the dataset. The complete list of matching KEGG numbers was searched
in KEGG mapper to establish a baseline for representation for each
pathway in the database. Then, lists of significantly up- and
down-regulated proteins were searched, and the proportion of
representation in pathways (number of proteins in a given pathway
divided by total proteins in the list) compared against proportion for
the complete DIA assay library protein list. This approach indicated
pathways with over-representation of significantly regulated proteins.