2.2 Search database MSV000089235
The constructed search database consisted of sequences that were obtained from P. larvae genomic assemblies currently available in GenBank. Overall, 15 accessions were downloaded on 7thMarch 2023, and the list including key details is shown in Table 1. These genomic assemblies were processed in MaxQuant version 2.2.0.0 [17] via a six-frame translation specific to bacteria with a threshold of 20 amino acids. Furthermore, we used two reference databases of P. larvae that were constructed from sequences downloaded on 7th March from the NCBI database. One database component consisted of 20,472 RefSeq sequences (denoted RefSeq or D17), and the second database component, which was larger, contained 71,586 unfiltered nonredundant sequences (further denoted NCBIall or D16). Note that the RefSeq sequences are included in NCBIall and that there is in fact sequence header duplication, but distinguishing the databases based on separate markers facilitates the required identification of individual database components for successful protein hit identification. Furthermore, to facilitate comparison of novel evaluations with previous results, we included in the data search a UniProt database that was used in a previous study [11] for MS/MS data evaluation of ERIC I–IV exoprotein fractions (further denoted UniProtprev or D18). Overall, the complex search database consisted of 18 components, whose effectivity as components of a decoy database for data mining was further evaluated.