2.2 Search database MSV000089235
The constructed search database consisted of sequences that were
obtained from P. larvae genomic assemblies currently available in
GenBank. Overall, 15 accessions were downloaded on 7thMarch 2023, and the list including key details is shown in Table 1.
These genomic assemblies were processed in MaxQuant version 2.2.0.0
[17] via a six-frame translation specific to bacteria with a
threshold of 20 amino acids. Furthermore, we used two reference
databases of P. larvae that were constructed from sequences
downloaded on 7th March from the NCBI database. One
database component consisted of 20,472 RefSeq sequences (denoted RefSeq
or D17), and the second database component, which was larger, contained
71,586 unfiltered nonredundant sequences (further denoted NCBIall or
D16). Note that the RefSeq sequences are included in NCBIall and that
there is in fact sequence header duplication, but distinguishing the
databases based on separate markers facilitates the required
identification of individual database components for successful protein
hit identification. Furthermore, to facilitate comparison of novel
evaluations with previous results, we included in the data search a
UniProt database that was used in a previous study [11] for MS/MS
data evaluation of ERIC I–IV exoprotein fractions (further denoted
UniProtprev or D18). Overall, the complex search database consisted of
18 components, whose effectivity as components of a decoy database for
data mining was further evaluated.