Protein functional network analysis of DIA data
STRING enrichment analysis was performed using abundance data for 2114 detectable proteins out of 2120 proteins included in the DIA assay library. to identify categories that are enriched and depleted following acclimation of fish to BW (Table S1). In addition, a smaller subset of proteins was used for more in-depth network analysis, which included all proteins from clusters 1 and 6, as well as the remaining 5 significantly regulated proteins which were not in clusters 1 or 6. Additionally, all significantly regulated mRNAs were also included by using their corresponding protein accession numbers (ACs) (Figure 5). In total, 277 unique protein ACs were queried with STRING for further network and pathway analysis.
From this list, 174 protein ACs from the DIA assay library and 1 protein AC with a significantly regulated mRNA were mapped to corresponding STRING IDs while the remainder had no STRING ID. The resulting list of 175 STRING IDs was enriched in kidneys of BW fish with FDR<0.05 for “mitochondrion”, “monooxygenase”, “oxidoreductase”, “NADP”, “Microsome”, “GTP-binding”, “glycolysis”, and “FAD” (Table 1). Protein domain enrichment in kidneys of BW fish was most significant for (FDR<0.002) for Aldo/keto reductase, short chain dehydrogenase/reductase SDR, flavin binding monooxygenase, and pyridine nucleotide-disulphide oxidoreductase.
Only using the small set of significantly regulated proteins for STRING analysis yielded many fewer enriched categories than using the complete set of 175 STRING IDs or using sets consisting of all proteins in clusters 1 (down-regulated) and 6 (up-regulated) (Table 1). Significantly regulated proteins also did not correspond to any of the protein identifiers enriched in the whole protein set (Table S1) This result demonstrates the added value of performing non-biased clustering and STRING analysis in tandem.
When consolidating the list of 175 STRING IDs into a Markov Clustering (MCL) network, 119 IDs were found to have at least one edge between protein nodes included in this list. At an MCL inflation rate of 1.3, proteins were separated into 7 networks that included all 23 significant proteins which were connected to another node by least one edge (Figure 6). Networks 1 and 3 account for 13 of the significant proteins and include 6 of the 8 most highly regulated proteins which were found in the network map, defined as having an FC greater than 4 (Figure 7). Network 1 was enriched for the keyword hydrolase and the protein domains glutathione S-transferase, papain cysteine protease, and thioredoxin-like. Network 3 was enriched for the keyword ligase and the protein domains aldo-keto reductase, acetyl-CoA synthetase, Acetate-CoA ligase, and NADP-dependent oxidoreductase. Other networks were associated with terms associated to the overall list, such as cytochrome C-oxidase for network 4 and short chain dehydrogenase/reductase SDR for network 2 (Table 1), linking STRING networks with specific cellular responses.
In a separate analysis, the query list of 277 unique protein ACs was associated with 236 KEGG orthology (KO) identifiers. From the complete list, over-representation of KO identifiers was greatest for KEGG pathways 01220 (degradation of aromatic compounds), 00625 (Chloroalkane and chloroalkene degradation), and 00982 (Drug metabolism - cytochrome P450). For significantly up-regulated proteins over-representation was greatest in the pathway 00053 (ascorbate and alderate metabolism) due to the presence of UDP-glucoronosyltransferase (UGT) and aldehyde dehydrogenase (ALDH). This pathway was especially important as it also contains the significantly down-regulated MIOX protein (Figure 8). KEGG over-enrichment for the full list can be linked predominantly to the STRING network 3, which is enriched in KEGG pathways 01220, 00625, and 00053. The greatest over-representation of significantly down-regulated proteins was in KEGG pathway 05100 (bacterial invasion of epithelial cells), which was also seen greatly over-represented in network 6 due to the significantly down-regulated proteins actin-related protein 2/3 complex subunit 1A, integrin-linked kinase, and cell division cycle control protein 42, and dynamin-2 isoform X4.
Key kidney proteins associated with BW acclimation of fish were identified based on statistical significance, degree of FC, correlation with mRNA regulation, and their presence in central STRING networks and KEGG pathways. Some of these proteins were represented by only one paralog, but several proteins were found to have one paralog which was significantly regulated, while corresponding paralogs had low-FC or were regulated in the opposite direction. These proteins include the up-regulated Von Willebrand factor A domain-containing (VFA) protein, elongation factor A (EFA), ALDH and UGT, and the down-regulated proteins hemoglobin and DHRS11 (Figure 9).