Materials and Methods
Sampling
In
a previous study, we screened 689 scale insect specimens forWolbachia (Sanaei et al. 2021b). Among samples with positiveWolbachia infection, we selected those whose MLST genes
(Multilocus Sequence Typing, (Baldo et al. 2006)) were successfully
amplified. This included 59
specimens belonging to 29 species and four scale insect families
(Monophlebidae, Pseudococcidae, Coccidae and Eriococcidae).
Of
29 species, 15 are represented by single samples, two are represented by
two samples from the same population, and 12 species are represented by
multiple samples from more than one population. For full details we
refer to File S2. 16 of these scale insects were collected together with
a directly associated insect (including ants, wasps, flies, beetles,
moths), and these were also included. The tight ecological connection
between the scale insect and an associate was established either by
direct observation (e.g., ant-scale insect interactions) or by rearing
both members of the pair in the laboratory conditions (e.g., rearing
parasitoids from the scale insect sample). Based on observation, wasps
and flies are mostly parasites and moth caterpillars are predators of
scale insects. We selected 16 infected pairs: five scale insect-ant,
seven scale insect-wasp, two scale insect-beetle, two scale insect-fly,
and one scale insect-moth pair. We were unable to determine the species
for any of the associates, except for the ants (Technomyrmex
albipes ) and one of the beetles (Neopocadius pilistriatus );
however, we determined their COI barcode.
PCR and sequencing
To be able to detect all Wolbachia strains even in multiple
infected host individuals, we implemented an approach of Illumina
multi-target amplicon sequencing. For this, we used 16S and MLST genes,
which included five housekeeping genes (coxA, fbpA, ftsZ, gatb, hcpA),
as well as the wsp (Wolbachia Surface Protein) gene (Baldo
et al. 2006). Despite some limitation in using MLST (Bleidorn & Gerth
2018), it is still a reliable source in strain determination and
evolutionary history analysis (Wang et al. 2020). For the host genes, we
targeted Cytochrome Oxidase I (COI), 18S and 28S Ribosomal RNA genes.
The host genes were used later to confirm both scale insect and
associate species identity and to build the host phylogeny. As a
requirement for our Illumina sequencing platform, some of the primers
were re-designed to yield products shorter than 500bp length (Table S1).
We also added Illumina specific overhang adapters at the start (5‘) of
the forward and reverse primers (GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG and
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG, respectively). PCR configurations
were applied as originally suggested for each gene (Baldo et al. 2006).
We pooled the amplicons for all ten genes (seven Wolbachia and
three host genes) per sample and sent them to the Australian Centre of
Ecogenomics, ACE (The University of Queensland, Australia). At ACE,
library preparation was performed by dual indexing workflow elaborated
by Teh el al., (2021). At first,
PCR products were amplified by NEBNext® Ultra™ II Q5® Mastermix (New
England Biolabs #M0544). Generated PCR amplicons then were purified
using Agencourt AMPure XP beads (Beckman Coulter). These products were
then indexed with unique 8bp barcodes using the Illumina Nextera XT 384
sample Index Kit A-D (Illumina FC-131-1002) with NEBNext® Ultra™ II Q5®
Mastermix. All the PCRs were conducted in the standard condition.
Indexed amplicons were pooled together in equimolar concentrations and
sequenced on Illumina MiSeq Sequencing System using paired end
sequencing with V3 300bp chemistry. As part of the workflow, the
following controls were applied: 1. Negative amplification control from
a like processed reagent control to monitor for contamination in library
construction 2. Single well empty chamber controls within processing
plates to monitor for cross contamination within the library preparation
3. Negative index positions between runs to monitor for run-to-run bleed
through designated as in line controls. Passing QC of resulting sequence
is determined as 10,000 raw reads per sample prior to data processing
and passing Quality Control metrics in line with Illumina supplied
reagent metrics of overall Q30 for 600bp reads of >70%.
Wolbachia strain
determination
To determine the identity of the Wolbachia strain in our sample,
we developed an R-based (R Core Team 2013) bioinformatics pipeline based
on the DADA2 pipeline (Callahan et al. 2016), which includes a series of
quality controls, trimming and mapping to the references. In addition,
we blasted all generated OTUs against Genbank
(https://www.ncbi.nlm.nih.gov/genbank/) and the Wolbachia MLST
database
(https://pubmlst.org/organisms/Wolbachia-spp).
Strain determination was first
conducted on single infected samples. In the next step, these strains as
well as registered strains in the MLST database were utilized as
references to determine all the strains in multiple infected samples.
Despite our methodology being powerful enough to identify co-infecting
strains, it is limited in its ability to detect intra-host
recombination. The details of the pipeline are explained in File S1
(including Figure S1) and the R scripts are available in File S4.
Wolbachia strains are
commonly determined by their MLST allele. Based on data available on the
MLST database, any genetic variation from one or more MLST alleles of a
given strain (which can be a single nucleotide base) is defined as a
distinct strain (Baldo et al. 2006). This definition of Wolbachiastrains is controversial (Bleidorn & Gerth 2018). Based on MLST genes,
the lowest estimated and suggested pairwise distance amongWolbachia strains is 0.001 (Ilinsky & Kosterin 2017). Therefore,
to avoid promoting subtle variations in MLST as a new strain, we grouped
strains with up to five bases difference across all seven genes (a total
of ~3065bp) into ”strain groups.” (e.g., groupingw Api1.1 and w Api1.2 into w Api1). To constructWolbachia phylogeny, conduct host-parasite congruency tests,
detect potential host shift events and finally run the GAMM model, we
always used strain groups instead of strains.
Reconstruction of
phylogenies
Wolbachia and host genes were aligned in Geneious (Version
11.0.5, Biomatters) using the MAFFT algorithm (Katoh et al. 2002). Each
gene was then trimmed to have identical lengths across samples.
PartitionFinder2 (Lanfear et al. 2017) was used to find the best-fit
partitioning scheme and substitution model for phylogeny estimation
using default parameters. The results were then used as an input for
estimating the Maximum Likelihood tree using RAxML (Stamatakis 2014)
with ”Rapid Bootstrapping and Search for the Best scoring ML” and 1000
bootstrap replicates. As recombination is common among Wolbachiagenes, the branch lengths of the Wolbachia phylogenetic tree were
corrected with ClonalFrameML (Didelot & Wilson 2015) to account for
recombination events.
The MLST profile of all registered strains in the Wolbachia MLST
depository (https://pubmlst.org/organisms/Wolbachia -spp) was
downloaded (on 5th November 2020). As most of the
original Wolbachia MLST gene fragments were longer than the gene
fragments in our study, the imported database was trimmed in Geneious to
match the current study. The phylogenetic tree of all strains, including
the reported strains in the MLST database and those from the current
study, was estimated as above. This tree was used to determine the
position of strains from scale insects within the variousWolbachia supergroups.
The phylogenetic trees of all hosts and Wolbachia strains, as
well as the Wolbachia -host association network, were plotted in R
by using the phytools package (Revell 2012). A 3D interactive bipartite
graph (File S5) was also created using the bipartiteD3 R package (Terry
2019). To test the phylogenetic
congruence between Wolbachia and their hosts, we ran two tests.
First, we performed a Parafit test (Legendre et al. 2002), which
assesses the genetic distance similarity of host and parasite
phylogenies. To this end, we used the parafit function implemented
within the ape R-package (Paradis & Schliep 2019) with the lingoes
correction method for negative eigenvalues and 100,000 permutations.
Second, we adopted the Procrustean Approach (known as PACo) which
assesses the similarity between host and parasite trees by estimation of
Euclidean embeddings derived from distance matrices (Balbuena et al.
2013). This test, which is implemented in PACo R package (Balbuena et
al. 2013), was performed with 100,000 permutations. These two tests
provide statistics to assess the independence of phylogenies by either
rejecting or accepting the null hypothesis that the similarity between
the trees is not higher than expected by chance. All R scripts developed
and used in this study are provided in File S3.
Factors determining host
shifts
An expanded version of a Generalised Additive Mixed Model (GAMM),
originally developed for viral sharing across mammal species (Albery et
al. 2020), was applied by using the mgcv package in R (Wood 2011). This
GAMM allowed us to model a non-linear fit between our explanatory and
response variables and allowed us to more readily account for their
uneven distributions. Specifically, this model examined the probability
of a given pair of scale insect species sharing one or moreWolbachia symbionts, as a function of their phylogenetic and
geographic similarity, with a logistic link function:
Wolbachia (0/1) ~ phylogenetic distance + home
range overlap + geographic distance
Phylogenetic distance was inferred from the Australian scale insect
phylogenetic tree as explained above. To quantify habitat sharing
between scale insect species, we constructed each species’ geographic
range using their observed locations. For all species with 5 or more
samples, we constructed a minimum convex polygon (MCP) in R. The
coordinates for the MCP (File S4) were collected from various sources,
mainly including the LGC collection (Cook Lab, School of Biological
Science, The University of Queensland), ScaleNet (García Morales et al.
2016), the Atlas of Living Australia website (https://www.ala.org.au/),
and several published articles (File S4). For each pair of species, we
calculated the overlap of these polygons as a proportion of both
species’ total range size. We also derived Euclidean distances between
species’ sampling locations by calculating pairwise distances between
species’ centroids. Species with fewer than 5 geographic observations
were not included in the model. We also excluded Coccus
formicarii which was collected from Taiwan and therefore difficult to
fit in the model. A total number of 22 species were included in the GAMM
model.
We fitted phylogenetic distance, home range overlap, and geographic
distance as explanatory variables, and we fitted paired species’
identities as multi-membership random effects to account for variation
in richness and sampling frequency between species (Albery et al. 2020).
To quantify their impact on model fit we examined the change in deviance
information criterion (DIC), where a change in 2 DIC was taken to
represent an improved model. To avoid fitting too many variables in the
model, we sequentially added each pairwise term, retaining the one that
most improved model fit, and then repeating the process with the
remaining variables, until no remaining variables improved the model.
The R scripts are available in File S3.