3.3 Selected markers
3.3.1 Family 10 glycosylhydrolase/fibronectin 3 (GHL10‒FN3)
The inspection of highly contrasting identifications of similar proteins
showed different database identifications for two isoforms denoted
primarily as alpha-galactosidase; However, further inspection of the
hits showed that the proteins are denoted as family 10 glycosylhydrolase
or hypothetical proteins. One isoform (id 253 hit) was identified by D1,
D3, D5, D13, D15 and D16–D18 and was identified in three genotypes,
ERICs II–IV, while a second isoform (id 455 hit) identified by D2, D4,
D6, D7, D8, D10, D11, D12 and D16–18 was identified in only ERIC II.
Both proteins were completely absent in the ERIC I genotype exoprotein
fraction. Alignment analysis of MS/MS-derived peptides that identified
the two isoforms confirmed 6 amino acid (aa) substitutions in the
identified peptides. The substitutions confirmed by MS are shown in
Figure S1 (and Table S1 ‒ column AH, rows 12–13). All the sequences
that identified the two isoforms were found (Tables S2 and S3). Further
analysis of all the protein sequences in InterPro showed that the
protein isoforms were composed of two domains. The following two domains
with approximate length of the predicted sequence units were identified
(i) IPR003790 – Glycosyl hydrolase-like 10 (GHL10; ~300
aa length); (ii) IPR036116/IPR003961 – Fibronectin type III (FN3;
~100 aa length). Note that the shorter (or partial)
sequences were composed only of one domain HGL10.
3.3.2 Unique hit identified by only D14
D14 uniquely identified 1 hit (id: 475; CP020557.1:False:1915540) that
was 884 aa in length (Table S4). A BLAST search in the NCBI database
showed that the amino acid sequence identified in the hit id:475 shared
100% identity and 66% query coverage with the 586 aa dihydroxyacetone
kinase subunit DhaK [P. larvae ]
(WP_079940467.1).
The remaining part of the sequence, consisting of 298 amino acids,
shared 100% identity and 95 query coverage with a 378 aa glycerol
dehydrogenase [P. larvae ]
(gldA;
WP_023482866.1;
WP_275014237.1).
The alignment of the 3 sequences is shown in Figure 2. Further analysis
in InterPro showed that the protein was composed of additional domains.
The following six results with predicted sequence occupancies for the
three structural units in the protein were identified: (i)
IPR001670
–
ADH_Fe/GldA
(13-277); (ii)
DhaK_dom
–
IPR004006
(316-626/305-627),
DhaK_1
–
IPR012736
(299-626); (iii)
DhaL_dom
–
IPR004007
(672-881/699-880/698-881),
DhaL_dom_sf
–
IPR036117
(662-882/667-884),
DhaK_L_YcgS
–
IPR012737
(667-880). Overall, the results indicated that a new ORF was identified.
3.3.3 Unique hit that was not identified by only D14
We obtained a contrasting result for identification by only D14 (see
above). The contrasting unique hit (id: 154;
WP_268569954.1)
absent in only D14 was a type III-B CRISPR module-associated Cmr3 family
protein. The hit was identified in only ERIC IV. Interestingly, in our
dataset, there was a different hit (id: 248; WP_172422807.1) with a
similar fasta header, “type III-B CRISPR module RAMP protein Cmr4”,
that was also not identified by D14 and was identified in only ERIC IV.
Overall, D2, D6, D7, D8, D9, D12, and D14 did not identify this hit. In
this case, successful identification by D3 may be important since it is
for the ERIC IV assembly.
3.3.4 Collagenase
Two similar collagenases were identified in the dataset. One hit (id:
110), the microbial collagenase ColA (e.g., AVF33142.1) expressed in
ERICs II–IV, was identified by D1, D3, D5, D9, D13, D15, and D16–18.
However, a different hit (id: 305) for collagenase (e.g.,
WP_083039955.1) was identified in only ERIC II, and only the D14
(CP020557.1:True:2089426–1117 aa) and D16–18 database components
provided the identification. Clustal/BLASTp analysis showed that the two
isoforms were very similar but had 4 single amino acid substitutions.
3.3.5 DUF3221 domain-containing protein
A DUF3221 domain-containing protein (id: 438) that was 96 aa in length,
which was not identified by D18, was identified via 4 peptides, but only
in the ERIC I and IV genotypes. However, protein sequences highly
similar to DUF3221 from ERIC II were found in the database. The two
DUF3221 domain-containing proteins with the highest identity based on
UniProt BLAST
(V9W786; identity
= 88.7% and
V9W919; identity
= 78.4%) were present in our search results as 2 separate protein hits.
V9W786 (id: 232)
was identified in all 4 ERICs and had very low intensity in ERIC I, but
similarly high intensity was observed in ERICs II–IV.
V9W919 (id: 8)
was detected at low intensity in ERIC I but similarly high intensity in
ERICs III and IV.
Another DUF3221 was identified under id:331 (ETK25839.1), and it was
abundant in ERICs I, III and IV but not detected in ERIC II. It is
important that this hit was identified by all database components,
except D10, D11 and D9. Incidentally, this hit had a unique
identification pattern and is visualized first on the right in Figure 1.
Furthermore, the DUF3221 hit (id: 311) was identified as being
abundantly expressed in all ERICs, with abundance increasing with the
ERIC number; thus, ERIC IV had the highest abundance. The hit was
identified by all 18 database components. In all ERICs, a DUF3221 hit
was also expressed (id: 232), but although a similar high abundance was
observed in ERICs II–IV, a low abundance was observed in ERIC I.
3.3.6 DUF3862 domain-containing protein
The different protein with a domain of unknown function that we selected
was DUF3862 (e.g., AVF32430.1). A protein hit (id: 66) identified by D1,
D3, D5, D8, D13, D15 and D16–18 was identified in ERICs I, II and IV
but was completely absent in ERIC II. Interpro domain analysis showed
that it belonged to the overlapping homologous superfamily H
BamE-like
(IPR037873),
which may indicate similarity to the outer membrane lipoprotein OmpA
and/or the beta-lactamase inhibitor BLIP.
3.3.7 InhA
Although immune inhibitor A (InhA) could be identified by all database
components, it was not again identified as being expressed in the ERIC I
genotype, but it was confirmed as being expressed in ERICs II–IV.
Sequence analysis indicated a “EAGGGDLGE” peptide deletion in ORFs of
CP019687.1:False:2215783 (D06; isolate ATCC-9545) and
CP019651.1:True:1009346 (D07; isolate DSM 7030), which are from ERIC I
genome assemblies. Incidentally, a BLASTp search of the peptide
“EAGGGDLGE” in the NCBI database showed that this sequence was typical
of various InhA proteins, including those of P. larvae .
3.3.8 ABC transporters related to iron–siderophore uptake
One contrasting result in expression was observed for ABC-transporter
proteins that relate to iron uptake; however, the protein hits were
identified by all 18 database components. Iron-siderophore ABC
transporter-like protein (id: 355) was identified in the exoprotein
fractions of ERIC III and IV, while Iron-hydroxamate ABC transporter
substrate-binding protein (id: 156) and Iron-uptake system-binding
protein FeuA (id: 261) were identified only in ERIC I and II. Note that
proteins participating production of siderorophore “bacillibactin”
[26] (genes: dhbACEBF)” (dhbA – V9W311, dhbC – V9W8G5, dhbE -
V9W5G3, dhbB – V9W645, dhbF – V9W6T6) (Table 4 in Beims et al.
[9]) were not identified in the exoprotein fractions of ERICs I–IV.