3.3 Selected markers
3.3.1 Family 10 glycosylhydrolase/fibronectin 3 (GHL10‒FN3)
The inspection of highly contrasting identifications of similar proteins showed different database identifications for two isoforms denoted primarily as alpha-galactosidase; However, further inspection of the hits showed that the proteins are denoted as family 10 glycosylhydrolase or hypothetical proteins. One isoform (id 253 hit) was identified by D1, D3, D5, D13, D15 and D16–D18 and was identified in three genotypes, ERICs II–IV, while a second isoform (id 455 hit) identified by D2, D4, D6, D7, D8, D10, D11, D12 and D16–18 was identified in only ERIC II. Both proteins were completely absent in the ERIC I genotype exoprotein fraction. Alignment analysis of MS/MS-derived peptides that identified the two isoforms confirmed 6 amino acid (aa) substitutions in the identified peptides. The substitutions confirmed by MS are shown in Figure S1 (and Table S1 ‒ column AH, rows 12–13). All the sequences that identified the two isoforms were found (Tables S2 and S3). Further analysis of all the protein sequences in InterPro showed that the protein isoforms were composed of two domains. The following two domains with approximate length of the predicted sequence units were identified (i) IPR003790 – Glycosyl hydrolase-like 10 (GHL10; ~300 aa length); (ii) IPR036116/IPR003961 – Fibronectin type III (FN3; ~100 aa length). Note that the shorter (or partial) sequences were composed only of one domain HGL10.
3.3.2 Unique hit identified by only D14
D14 uniquely identified 1 hit (id: 475; CP020557.1:False:1915540) that was 884 aa in length (Table S4). A BLAST search in the NCBI database showed that the amino acid sequence identified in the hit id:475 shared 100% identity and 66% query coverage with the 586 aa dihydroxyacetone kinase subunit DhaK [P. larvae ] (WP_079940467.1). The remaining part of the sequence, consisting of 298 amino acids, shared 100% identity and 95 query coverage with a 378 aa glycerol dehydrogenase [P. larvae ] (gldA; WP_023482866.1; WP_275014237.1). The alignment of the 3 sequences is shown in Figure 2. Further analysis in InterPro showed that the protein was composed of additional domains. The following six results with predicted sequence occupancies for the three structural units in the protein were identified: (i) IPR001670 – ADH_Fe/GldA (13-277); (ii) DhaK_dom – IPR004006 (316-626/305-627), DhaK_1 – IPR012736 (299-626); (iii) DhaL_dom – IPR004007 (672-881/699-880/698-881), DhaL_dom_sf – IPR036117 (662-882/667-884), DhaK_L_YcgS – IPR012737 (667-880). Overall, the results indicated that a new ORF was identified.
3.3.3 Unique hit that was not identified by only D14
We obtained a contrasting result for identification by only D14 (see above). The contrasting unique hit (id: 154; WP_268569954.1) absent in only D14 was a type III-B CRISPR module-associated Cmr3 family protein. The hit was identified in only ERIC IV. Interestingly, in our dataset, there was a different hit (id: 248; WP_172422807.1) with a similar fasta header, “type III-B CRISPR module RAMP protein Cmr4”, that was also not identified by D14 and was identified in only ERIC IV. Overall, D2, D6, D7, D8, D9, D12, and D14 did not identify this hit. In this case, successful identification by D3 may be important since it is for the ERIC IV assembly.
3.3.4 Collagenase
Two similar collagenases were identified in the dataset. One hit (id: 110), the microbial collagenase ColA (e.g., AVF33142.1) expressed in ERICs II–IV, was identified by D1, D3, D5, D9, D13, D15, and D16–18. However, a different hit (id: 305) for collagenase (e.g., WP_083039955.1) was identified in only ERIC II, and only the D14 (CP020557.1:True:2089426–1117 aa) and D16–18 database components provided the identification. Clustal/BLASTp analysis showed that the two isoforms were very similar but had 4 single amino acid substitutions.
3.3.5 DUF3221 domain-containing protein
A DUF3221 domain-containing protein (id: 438) that was 96 aa in length, which was not identified by D18, was identified via 4 peptides, but only in the ERIC I and IV genotypes. However, protein sequences highly similar to DUF3221 from ERIC II were found in the database. The two DUF3221 domain-containing proteins with the highest identity based on UniProt BLAST (V9W786; identity = 88.7% and V9W919; identity = 78.4%) were present in our search results as 2 separate protein hits. V9W786 (id: 232) was identified in all 4 ERICs and had very low intensity in ERIC I, but similarly high intensity was observed in ERICs II–IV. V9W919 (id: 8) was detected at low intensity in ERIC I but similarly high intensity in ERICs III and IV.
Another DUF3221 was identified under id:331 (ETK25839.1), and it was abundant in ERICs I, III and IV but not detected in ERIC II. It is important that this hit was identified by all database components, except D10, D11 and D9. Incidentally, this hit had a unique identification pattern and is visualized first on the right in Figure 1.
Furthermore, the DUF3221 hit (id: 311) was identified as being abundantly expressed in all ERICs, with abundance increasing with the ERIC number; thus, ERIC IV had the highest abundance. The hit was identified by all 18 database components. In all ERICs, a DUF3221 hit was also expressed (id: 232), but although a similar high abundance was observed in ERICs II–IV, a low abundance was observed in ERIC I.
3.3.6 DUF3862 domain-containing protein
The different protein with a domain of unknown function that we selected was DUF3862 (e.g., AVF32430.1). A protein hit (id: 66) identified by D1, D3, D5, D8, D13, D15 and D16–18 was identified in ERICs I, II and IV but was completely absent in ERIC II. Interpro domain analysis showed that it belonged to the overlapping homologous superfamily H BamE-like (IPR037873), which may indicate similarity to the outer membrane lipoprotein OmpA and/or the beta-lactamase inhibitor BLIP.
3.3.7 InhA
Although immune inhibitor A (InhA) could be identified by all database components, it was not again identified as being expressed in the ERIC I genotype, but it was confirmed as being expressed in ERICs II–IV. Sequence analysis indicated a “EAGGGDLGE” peptide deletion in ORFs of CP019687.1:False:2215783 (D06; isolate ATCC-9545) and CP019651.1:True:1009346 (D07; isolate DSM 7030), which are from ERIC I genome assemblies. Incidentally, a BLASTp search of the peptide “EAGGGDLGE” in the NCBI database showed that this sequence was typical of various InhA proteins, including those of P. larvae .
3.3.8 ABC transporters related to iron–siderophore uptake
One contrasting result in expression was observed for ABC-transporter proteins that relate to iron uptake; however, the protein hits were identified by all 18 database components. Iron-siderophore ABC transporter-like protein (id: 355) was identified in the exoprotein fractions of ERIC III and IV, while Iron-hydroxamate ABC transporter substrate-binding protein (id: 156) and Iron-uptake system-binding protein FeuA (id: 261) were identified only in ERIC I and II. Note that proteins participating production of siderorophore “bacillibactin” [26] (genes: dhbACEBF)” (dhbA – V9W311, dhbC – V9W8G5, dhbE - V9W5G3, dhbB – V9W645, dhbF – V9W6T6) (Table 4 in Beims et al. [9]) were not identified in the exoprotein fractions of ERICs I–IV.