Legend: Id. – database ID as designated in this study; Strain – description as denoted in GenBank; Level – current genome status as provided in GenBank; Mb – size of the genome assembly; BioProject – is provided by hypertext link; Protein seq – number of proteins available for the BioProject, *note that PRJNA362897 with 16,238 proteins consists of four genome assemblages.
Figure 1 UpSet analysis of the proteomic analysis using a wide database. The presentation shows consensus of protein hits identified in particular components of the database. The result indicates large differences in identification.