Keywords
Genomic data, hybridisation, phylogenetic discordance, species
boundaries, island evolution
Introduction
Understanding the mechanisms driving the diversity of life on Earth is a
fundamental question in biology, with speciation being the critical
driving process
(Hernández-Hernández
et al., 2021; Jia et al., 2021; Seehausen et al., 2014). Speciation is
not a single event but a continuum, ranging from minimally
differentiated to entirely distinct species. This continuum is often
reflected in the genetic diversity within and between populations, which
can be further modulated by processes such as incomplete lineage sorting
(ILS) and hybridisation. Systematic studies integrating phenotypic and
molecular data are essential for placing species within an evolutionary
framework. However, the gradual divergence of species through time and
processes such as ILS or gene flow can result in genetic and phenotypic
variation gradients that challenge the reconstruction of accurate
phylogenies and blur species boundaries.
Along with evolutionary processes, methodological and analytical
decisions (e.g., data type and quality and probabilistic framework) may
also contribute to phylogenetic uncertainty and fail to establish
species limits (Steenwyk et
al., 2023). When referring to genomic data, one of the main challenges
in unravelling recently diverged evolutionary lineages is distinguishing
similarity due to gene flow from that of shared ancestry. Detecting
hybrid zones, especially in geographically constrained areas like
islands, often leads to debates about whether observed genetic patterns
reflect ongoing gene flow or result from historical separation followed
by secondary contact (Gompert
& Buerkle, 2016). Overestimating gene flow can lead to erroneous
conclusions regarding the limits and evolutionary relationships among
species, whereas underestimating it can conceal the role of
hybridisation in species diversification. This issue is further
complicated in cases of ILS, where ancestral genetic polymorphisms
persist across multiple species, leading to phylogenetic discordance
among loci (He et
al., 2023; Hernández-Gutiérrez et al., 2022; Herrig et al., 2024).
Detecting and correctly interpreting these signals is vital for
reconstructing and understanding the Tree of Life, as well as for
assessing biodiversity and conservation priorities.
Spiders are among the most diverse animals on Earth and play a
fundamental role as generalist predators in shaping terrestrial
ecosystems (Nyffeler &
Birkhofer, 2017; Wheeler et al., 2017). As in many other organisms,
hybridisation has been documented across multiple species, highlighting
its potential role in driving diversification and evolutionary
processes. For instance, hybrid zones have been identified among funnel
weaver spiders as the Eratigena atrica (C.L. Koch, 1843) group in
Europe (Croucher et al.,
2007; Oxford & Bolzern, 2018); in deeply divergent lineages of the red
devil spider genus Harpactocrates in the Pyrenees
(Bidegaray-Batista et al.,
2016), or in Iberian representatives of the cobweb spider genusTheridion (Domènech et
al., 2020) . Conversely, recent research has demonstrated that multiple
evolutionary processes, such as ancestral hybridisation events and ILS,
have contributed to the emergence of different ecomorphs in the Hawaiian
spiny-leg Tetragnatha spiders
(Cerca, Cotoras, Santander,
et al., 2023).
The wolf spider genus Hogna in the Galapagos Islands exemplifies
the significance of gene flow in evolutionary diversification,
demonstrating that high levels of intra-island introgression have likely
led to the parallel evolution of habitat specialisation across the
archipelago (De Busschere et
al., 2015). The worldwide distributed, species-rich genus Hogna Simon, 1885, includes 236 species (World Spider Catalog, 2024) of
medium- to large-sized spiders. This richness is the result of the usage
of the genus as a dump basket for Lycosinae species with conserved
morphology, mostly because species identification has been plagued by
the absence of distinctive diagnostic characters and low-quality
illustration (Logunov,
2020), thus resulting in a polyphyletic genus
(Crespo et al., 2022;
Piacentini & Ramírez, 2019). The genus has managed to disperse and
diversify in several archipelagos worldwide, including the
Mediterranean, the Caribbean, the Atlantic, and the Pacific
(Gloor et al., 2017). Seven
endemic species of the wolf spider genus Hogna have been
documented in the northeast Atlantic archipelago of Madeira, namelyHogna maderiana (Walckenaer, 1837), H. ingens (Blackwall,
1857), H. blackwalli (Johnson, 1863), H. heeri (Thorell, 1875), H. insularum (Kulczynski, 1899), H.
nonannulata Wunderlich, 1995 and H. isambertoi Crespo, 2022. A
recent study employing molecular and phenotypic data highlighted the
importance of integrative methods in resolving species delimitation and
understanding the endemic species’ biogeographic connections from the
Madeira archipelago (Crespo
et al., 2022). However, the study failed to draw the boundaries and
phylogenetic relationships of two endemic sibling species, namelyH. insularum , which inhabits all the islands, and the large and
colourful H. maderiana , circumscribed to Porto Santo and nearby
islets, where the two species overlap. This pair of species defied
classification due to the presence of intermediate phenotypes and the
lack of mitochondrial and nuclear sorting, despite significant
divergences in the DNA barcode gene (cytochrome c oxidase subunit I,COI ) (Crespo et al.,
2022).
In this study, we aim to resolve the taxonomic and evolutionary
conundrum posed by the Madeiran endemic species pair H. maderiana and H. insularum , by implementing a genome-wide approach using
double digest RADseq (ddRADseq). Specifically, we seek to determine
whether these two nominal species represent a case of hybridisation due
to gene flow on the Porto Santo Island or if they represent a single
species exhibiting extreme morphological polymorphism. To achieve this
goal, we used genome-wide data to analyse 61 specimens of H.
maderiana and H. insularum sampled across their known
distribution. Our results shed light on their taxonomic status, help to
understand the historical events leading to the observed genetic pattern
and evaluate the presence of genetic discordance across populations due
to incomplete lineage sorting and/or ongoing gene flow.
Material and Methods
Sampling.
We sampled 61 specimens from the previous study of Crespo et al. (2022),
representing the known range of the focal species in the Madeira
Islands. These specimens were distributed as follows: 27 H.
insularum , 32 H. maderiana, and two unidentified immatures. They
included eight Hogna insularum from Deserta Grande, seven
from Madeira, three from Bugio, nine from Porto Santo, and nearby islets
(three from Ilhéu de Cima and one from Ilhéu de Ferro). Additionally, we
collected 32 H. maderiana from Porto Santo (eight from
Ilhéu de Ferro). Finally, we included two immatures from Porto Santo
that could not be unambiguously assigned to any of the two species
(Figure 1). We also included one H. nonannulata specimen from
Madeira, seven H. ingens specimens from Deserta Grande, and one
from Madeira as outgroups (Table S1).
Haplotype network.
To analyse and visualise the relationships among mitochondrial DNA
sequences within H. maderiana and H. insularum, we used
the 93 COI sequences of 676 base pairs (bp) available from the previous
study (Crespo et al., 2022)
(Table S1). We estimated the haplotype network for the COI matrix using
the randomised minimum spanning tree method
(Paradis, 2018). We analyse
and visualise the data with the help of the R packages ape
(Paradis & Schliep, 2019)
and pegas (Paradis,
2010).
ddRADseq library preparation and sequencing
For the double-digest restriction-site associated DNA (ddRADseq) library
preparation, we used Speedtools Tissue DNA Extraction Kit (Biotools,
Madrid, Spain) or DNeasy Blood & Tissue Kit, (Qiagen, Valencia, CA,
USA) to extract and purify total genomic DNA, with minor modifications
in the cell lysis time (overnight incubation). We estimated the amount
of gDNA with Qubit dsDNA HS assay (Invitrogen, Carlsbad, CA, USA).
Genomic DNA was individually barcoded and processed in-house using the
ddRADseq procedure (Peterson
et al., 2012). For ddRADseq library preparation: 300 ng of genomic DNA
was digested in a reaction consisting of 0.5 μL high-fidelity
restriction enzymes MseI (10,000 units ml-1,
New England Biolabs, UK) , 0.5 μL EcoRI-HF (20,000 units ml-1, New England Biolabs) and 2 μl of
CutSmart Buffer (New England Biolabs). It was then incubated for 4 hours
at 37°C and enzymes were deactivated at 4ºC. We performed a purification
step with AMPure XP 1.6X magnetic beads (Beckman Coulter, Inc., Brea,
CA, USA) with a final elution in 40 μl. We measured the concentration of
purified and digested DNA with Qubit dsDNA HS assay (Invitrogen) and
used this value to normalise the DNA. We pooled them into three
libraries of 24 samples each based on concentration measurements (Table
S1). The resulting fragments were ligated to custom-made 24 P1 and one
P2 adapters containing sample-specific 7-base-pair barcodes and primer
annealing sites (Table S2). We added the listed buffers and enzymes in
every sample for the ligation step: 4 μl T4 DNA Ligase Buffer (New
England Biolabs), 1.5 μl T4 DNA Ligase
(2,000,000 units ml-1, New England Biolabs), 2 μl P1
adapter (1μM), 2 μl P2 adapter (4μM), 0.5 μl water and 30 μl of DNA
digested-normalised. The ligation process was performed for 2 h at 23°C,
enzymes were deactivated at 65°C for 10 min and the temperature
decreased by 2°C per 90 seconds until reaching 23°C. Then, we pooled all
the ligation products in three tubes and purified them with AMPure XP
1.5X magnetic beads (Beckman Coulter), eluting in 31 μl. We
size-selected the pools or libraries at 300 bp (range 150–400 bp) with
BluePippin (Sage Science, Beverly, MA, USA) using the cassette type
“2% DF Marker V1” and the “tight” option. Finally, we amplified
each library using the Phusion High-Fidelity PCR Kit (New England
Biolabs,) and we conducted PCRs with an initial denaturation for 30 s at
98 °C, 10 cycles (10 s at 98 °C, 30 s at 65 °C, 30 s at 72 °C) and a
final extension for 10 min at 72 °C. We cleaned and size-selected the
resulting libraries to remove possible remaining adapters and primers
using AMPure beads 0.8X (Beckman Coulter). We pooled the three libraries
at equal concentrations and paired-end sequenced on an Illumina NovaSeq
PE150 at Novogene Co.
ddRADseq locus assembly, filtering, and outlier detection
We demultiplexed and assembled paired-end reads into de novo loci
using the ipyrad 0.9.83
(Eaton & Overcast, 2020).
Raw reads were quality-trimmed to eliminate low-quality sequences, reads
with uncalled bases, and reads without complete barcode or restriction
cut site. We used the following parameters in ipyrad:
“pairddrad” data type for double-digest RADseq; AATTC and TAA for the
restriction overhang. We used default parameters for base quality (5),
minimum (6), and maximum depth of reads in stacks (10,000). Reads
retained were further quality-filtered to convert base calls with a
Phred score below 20 into Ns and discard reads with more than two Ns.
Within sample clustering, we used the 0.85 threshold. We set the maximum
number of allowable barcode mismatches to 0 to ensure that each read was
correctly assigned to its identifying barcode. Filtering for
adapters/primers was set to 2, the strictest setting, to remove any
barcodes that may have been left over from the demultiplexing and could
interfere with downstream clustering. We set the minimum length of reads
to 35 and the maximum alleles per site in the consensus sequences to
two. We used the default settings for the maximum number of uncalled
bases and heterozygotes in consensus sequences (0.05). Finally, we used
default settings for the maximum number of single nucleotide
polymorphisms (SNPs) per locus (0.2) and
indels
per locus (8). We assembled two different datasets, each differing in
the number of species: dataset a) all specimens from the fourHogna species that were part of the study and dataset b) only
specimens from H. insularum and H. maderiana .
Population genomic structure and genetic diversity
To examine the genetic structure among the specimens in dataset b), we
performed a discriminant analysis of principal components (DAPC) (Table
S1). The DAPC method combines discriminant and principal component
analysis to maximise the genetic variability between groups and minimise
the variability within each group
(Jombart et al., 2010). The
study was performed with the R package adegenet 2.1.10
(Jombart & Ahmed, 2011). We
used the “xvalDapc” function to find the optimal number of principal
components (PCs) to be retained and “find.clusters” to find the
optimal number of clusters based on the Bayesian information criterion
(BIC). The acquired values were utilised to do a DAPC on the data using
the “dapc” function. We employed the “scatter” function to display
the graphs.
To infer the population genetic structure and detect potential admixture
among groups, we used structure 2.3
(Pritchard et al., 2000)
under the admixture model based on one SNP per locus, for dataset b)
(Table S1). We predefined the number of genetic clusters (K) from one to
five. We conducted ten replicates for each K, executing 200,000
iterations and a burn-in of 10,000. Graphs from the 10 runs at each
value of K were aggregated using clumpak
(Kopelman et al., 2015) and
the probable number of ancestral clusters was selected based on ΔK and
Pritchard from Structure Harvester 0.6.94
(Earl & vonHoldt, 2012;
Evanno et al., 2005).
In addition, we employed fineradstructure
(Malinsky et al., 2018), a
model-based method developed explicitly for evaluating extensive SNP
datasets, to assess the genetic structure patterns while identifying the
sources of ancestry in each population. This software provides high
resolution for inferring recent shared ancestry by utilising haplotype
linkage information and prioritising the most recent common ancestry
among the studied individuals. It generates a co-ancestry matrix,
summarising the dataset’s closest neighbour haplotype relationships.
More precisely, we employed the script finerad_input.py, which is part
of the fineRADstructure-tools package available at
https://github.com/edgardomortiz/fineRADstructure-tools,
to transform the ”.alleles” output from ipyrad into the
appropriate input format for fineRADstructure. As part of the conversion
process, we specifically filtered the dataset to include only loci that
were not linked (using the default parameter) and had a minimum sample
size of eight (–minsample 8). Individuals included in the analyses
are summarized in Table S1. Subsequently, we created the co-ancestry
matrix using the RADpainter module of fineradstructure. We
employed the default parameters of 100,000 Markov chain Monte Carlo
(MCMC) iterations, with a burn-in of 100,000 iterations and sampling
taking place every 1000 iterations, to conduct the analysis. A tree was
built using 10,000 iterations of the hill-climbing algorithm. We
visualised the results using the scripts fineradstructureplot.r
and finestructurelimrary.r, which may be accessed at
http://cichlid.gurdon.cam.ac.uk/fineRADstructure.html.
Since we identified a genetic group with a signal of putative admixture,
we investigated whether there is evidence of potential gene flow or
incomplete lineage sorting (ILS) between the different genetic groups.
To achieve this, we used Patterson’s D-statistics, also known as the
abba-baba test
(Durand et al., 2011),
implemented in dsuite 0.1
(Malinsky et al., 2021). The
D-statistics analyses correlations in allele frequencies across closely
related species since they share a significant amount of genetic
variation due to common ancestry and ILS. This algorithm computes the
D-statistic by considering a 4-taxon fixed phylogeny: (((P1, P2), P3),
O) where O is an outgroup with the ancestral allele “A”. The “A”
allele and derived allele “B” should follow the pattern BBAA (P1 and
P2 share the allele “B”); the pattern ABBA refers to P2 and P3 sharing
the derived allele “B” and BABA refers to P1 and P3 sharing the
derived allele. The allele patterns ABBA and BABA, which differ from the
species tree pattern BBAA, can provide evidence of introgression when
there is a disproportion between the two patterns, usually an excess of
them. ILS is the most plausible explanation if these two patterns occur
in equal frequency. We performed the four-taxon D-statistic test on two
different comparisons with dataset a: a) taking into account the three
genetic groups detected by the genetic structure analyses (H. insularum from Madeira, Desertas, and Bugio, H. insularum from Porto Santo and H. maderiana) and H. ingens as
outgroup; b) considering the three genetic groups across the different
islands, with H. ingens as outgroup (Table S1).
Demographic analyses
Based on the genetic structure analysis results, we conducted a
demographic modelling analysis to further elucidate between incomplete
lineage sorting or introgression as a mechanism to explain the recovered
evolutionary patterns within the Hogna species complex. We used
two approaches based on the unfolded site frequency spectrum (SFS). On
the one hand, using fastsimcoal2 (fsc28)
(Marchi et al., 2024), we
tested five demographic scenarios based on the multidimensional joint
site frequency spectrum (SFS). We used easySFS
(https://github.com/isaacovercast/easySFS)
to generate a folded multidimensional SFS. We downsampled our dataset to
a smaller number: four H. maderiana individuals, four H.
insularum from Madeira, and three H. insularum from Porto Santo.
Briefly, we simulated five demographic models (Figure S1). We first
considered two scenarios without gene flow between lineages, and hence
that admixture was due to ancestral polymorphism: model A) whereH. insularum split into two lineages, one from Porto Santo and
one from Madeira; and model B) where H. maderiana and H.
insularum from Porto Santo shared a common ancestor. The following
scenario (model C) considered recent gene flow between the two species
from Porto Santo (H. insularum and H. maderiana ). The
fourth scenario (model D) considered that H. insularum from Porto
Santo resulted from secondary contact between H. maderiana andH. insularum , with no subsequent gene flow. Finally, we tested
the same secondary contact scenario but with subsequent gene flow
between the three lineages (model E). We fixed the effective population
size (Ne ) for H. maderiana to enable the estimation of
other parameters in fastsimcoal2. We calculated Ne fixed
in the model from the level of nucleotide diversity (π) and estimates of
mutation rate per site per generation (μ, becauseN e = (π/4μ). We estimated π for H.
maderiana from polymorphic and non-polymorphic loci using
DnaSP (π = 0.0048). We assumed a mutation rate per site per
generation of 6 × 10-9 as estimated forDrosophila melanogaster (Wang et al., 2023). All
template files (*.tpl) that contain parameters and estimation files
(*.est) that contain unknown parameters for estimation were provided in
Supplementary Material. We ran 100 independent simulations of each model
in fastsimcoal2. Each run comprised 100,000 coalescent
simulations, 50 expectation–conditional maximisation (ECM) cycles, and
a proportion of the initial search in the last cycle of 0.5 (-z0.5). We
used Akaike’s Information Criterion (AIC) to determine the probability
of each model given the observed data and select the best model.
Finally, we calculated each parameter estimate’s confidence intervals
(CI) from the best-supported model by simulating 20 replicates of 100
SFS and re-estimating the parameters each time. We obtained the point
estimates of each demographic parameter for the best-supported model
from the run with the highest composite maximum likelihood score.
On the other hand, we implemented a model-flexible approach using
stairway plot2 (Liu
& Fu, 2015) to infer the demographic history of each lineage
separately and to understand the events that can explain the observed
pattern in fastsimcoal2. We assumed a generation time of 1 year. Median
estimates of Ne and confidence intervals were estimated with the
built-in bootstrap function using 200 subsets of the input data,
ignoring singletons. We graphically represented the results with
ggplot2.
Species tree under the multispecies coalescent model
Finally, we used the Bayesian multispecies coalescent framework of
snapp 2.7.3 (Bryant
et al., 2012) as implemented in beast 2 2.7.3
(Bouckaert et al., 2014) to
estimate a species tree and corroborate the presence of incomplete
lineage sorting. Given that SNAPP does not consider gene flow or
introgression in its coalescent process modelling, the species tree
generated would reflect the most probable evolutionary relationships
among species, assuming all discrepancies between gene trees and the
species tree are attributable to ILS. The dataset consisted of 696
unlinked single nucleotide polymorphisms that were converted into a
biallelic format. These SNPs were derived from the ingroup specimens,
which comprised 50 specimens from Hogna maderiana and Hogna
insularum . The individuals were categorised into three genetic groups:
a) H. maderiana , b) H. insularum from Madeira, Desertas,
and Bugio and c) H. insularum from Porto Santo (Table S1), based
on the results of the genomic structure analyses. The .usnps file
generated by ipyrad was modified and transformed into a
snapp input file. The model was defined with specific parameter
values: the mutation rates u and v were fixed at 1 and not subject to
sampling, the coalescent rate was set to 10 and subject to sampling,
non-polymorphic sites were excluded, and a log-likelihood correction was
applied. snapp utilises a Yule prior, where the parameter
lambda represents the rates at which new species originate. Prior
probabilities: alpha = 1, beta = 250, gamma = 1, lambda = 0.01, no
samples taken. We conducted four separate iterations, each consisting of
a chain length of 100,000 generations. In each iteration, we collected
samples at intervals of 100 generations. We assessed convergence by
ensuring the effective sample size (ESS) was greater than 200. We
determined that the burn-in period was 10% using tracer 1.7.2.
Additionally, we analysed the posterior distribution of trees using
densitree 3.0.2. We utilised LogCombiner to merge the four runs
and generated a maximum clade credibility tree (MCC) using
treeannotator 2.7.3. Finally, we visualised the tree using
itol 6.8 (Letunic &
Bork, 2024).
Results
Haplotype network
The 93 COI sequences yielded 51 haplotypes (Table S3, Figure S2). Among
these haplotypes, 35 were singletons. A taxonomic segregation into the
two species was not readily apparent in the haplotype network. In
several instances, haplotypes of one species were more closely connected
to the ones of the other species. Haplotypes of H. insularum from
Madeira formed an exclusive cluster. They shared the same ancestral
relative that most specimens sampled from Desertas and the closeby Bugio
islet, except for two distantly related haplotypes more closely
associated with Porto Santo ones. Haplotype IX was the most abundant (12
out of 93 individuals), present in Porto Santo and Ilhéu de Ferro, and
restricted to H. maderiana . The second most abundant haplotype
(XXXI) was exclusive to H. insularum from Madeira, Desertas, and
Bugio (Table S3, Figure S2).
SNP ddRADseq data
Sequencing of the ddRADseq library resulted in 164,606,284 paired-end
(PE) reads that contained a maximum of a single error per 1000 bases,
with a mean individual sample coverage of 1,130,758.37 ± 474,932.71
(Table S4). The initial matrix with the four species included 70
individuals (4034 SNPs, 497 unlinked SNPs, and 67.25% of missing data).
Still, after eliminating 11 individuals based on a low number of reads
(below 750,000) or low read quality, our final ddRADseq datasets
included 59 individuals, including two outgroups. After
quality-trimming, we retained for the first dataset a total of
74,772,557 reads (1,267,331.47 ± 372,437.93 individual sample coverage)
(Table S5), which resulted in 7158 SNPs, 803 unlinked SNPs and 67.2%
missing data. For the Hogna insularum and H. maderiana dataset,
we retained a total of 60,779,385 reads (1,215,587.70 ± 314,645.31)
(Table S6), which resulted in 6456 SNPs, 865 unlinked SNPs, and 65,54%
missing data for downstream analyse.
Population genomic structure and genomic diversity
Six PCs and two discriminant eigenvalues were retained during DAPC
analysis to describe the relationship between the samples. Under the BIC
criteria, three clusters were found as the optimal results. The first
cluster contained Hogna maderiana individuals from Porto Santo,
the second cluster gathered all Hogna insularum individuals from
Madeira, Desertas and Bugio (hereafter referred as MDB), whereas the
third comprised those individuals of Hogna insularum from Porto
Santo (hereafter referred as PS). The different clusters were
well-separated and did not overlap (Figure 2a).
structure analyses yielded an “optimal” clustering value for
K=3 according to Evanno’s ΔK criterion and K=2 according to Pritchard
(Figure 2b, Figure S3). The structure results for K=2 revealed
two genetically distinct groups of individuals, one corresponding toH. maderiana and one to H. insularum MDB; however,
individuals of H. insularum PS shared approximately 50% genetic
component of each genetic group. Considering K=3, there was no distinct
genetic structure, with individuals exhibiting different levels of
genetic similarity with a common dominant genetic component, but a
distinct additional component in individuals of H. insularum MDB.
Results for K=1 to K=5 are summarised in Figure S4.
The fineradstructure analysis was mainly in agreement with the
structure and DAPC results. The dendogram and coancestry matrix
resulting from fineradstructure resolved two major groupings in
the data, corresponding to H. insularum and H. maderiana (Figure 2c). Within the H. insularum group, individuals
clustered in two supported subclusters belonging to the individuals from
Porto Santo on one side and the individuals from the remaining islands
on the other. Overall, the co-ancestry level was high within the three
groups, higher in H. insularum MDB. Slightly higher average
values were observed between the H. insularum groups compared to
the coancestry levels between H. insularum PS and H.
maderiana . However, the coancestry values remain relatively uniform and
comparable across the species.
The abba-baba tests supported an incomplete lineage sorting
(ILS) scenario across the three genetic groups. No significant evidence
of gene flow was detected in any comparisons among H. insularum PS, H. insularum MDB, and H. maderiana , as indicated by
the non-significant p-values and relatively low Z-scores (Table
S7). On the other hand, although not directly relevant to the objective
of our analysis, we observed signals of gene flow between populations ofH. insularum from the Madeira and Bugio Islands. Detailed results
of the abba-baba tests are presented in Table S7.
Coalescent-based demographic model
The fastsimcoal2 analysis identified the model of three
independent lineages (H. insularum MDB, H. insularum PS,
and H. maderiana ) with neither past admixture nor contemporary
gene flow as the one with the highest support (model A; see Table S8 and
Figure 3a). According to this model, the lineage of H. insularum PS would constitute a distinct species closely related to H.
insularum MDB. Assuming a 1-year generation time, fastsimcoal2
estimated that lineages diverged from a shared ancestor approximately
divergence between the two H. insularum lineages occurred around
estimated Ne for H. insularum from Porto Santo wasH. insularum from Madeira, Desertas, and Bugio, it was
presentation of all estimated demographic variables and their weighted
averages can be found in Figure 3a and Table S9.
We repeated the demographic analyses using stairway plot,
revealing a consistent pattern in the two lineages inhabiting Porto
Santo Island. Both lineages exhibited a stepwise increase in effective
population size, eventually stabilising at a constant level. On the
other hand, in H. insularum MDB, we identified a rapid population
increase followed by a more recent stabilisation (Figure 3b).
Species tree under the multispecies coalescent model
snapp inferred three incongruent species tree topologies from
the posterior distribution. The species tree topology with the highest
probability (45.93%) indicated a sister group relationship between the
lineages from H. maderiana and H. insularum PS. The
following topology (35.53%) recovered the H. insularum PS and
MDB as sister groups. The third and least supported topology (21.54% of
the gene trees) recovered H. maderiana as sister to H.
insularum MDB (Figure 4). The consensus tree yielded a low support
level (posterior probability of 0.47) for the clade of the two Porto
Santo lineages (Figure S5).
Discussion
Genomic data has become an invaluable tool for unveiling complex
evolutionary scenarios, providing insights into hidden aspects of
population history and recent species divergence. However, current
practice in population genomic studies fail to provide evidence on the
full evolutionary history of a species and, therefore, achieve incorrect
conclusions. Our study focused on two endemic Hogna species from
the Madeira archipelago, exhibiting mitochondrial discordance and fuzzy
morphological limits. We used ddRADseq data, integrating population
genomic structure analyses and coalescent-based demographic modelling to
understand the contribution of incomplete lineage sorting (ILS) and gene
flow to the observed genetic patterns. Our genetic structure analyses
suggested three potential genetic clusters, one for each nominal species
and one compatible with hybridisation between the two species on Porto
Santo Island. However, our demographic models and the D-statistic test
explicitly rejected gene flow between the two lineages. Instead, they
supported a third independent Hogna lineage on Porto Santo
Island. The apparent absence of clear-cut limits between the lineages is
likely explained by unsorted polymorphism, probably due to the recent
divergence times, as some of our results suggest. The contrasting
results between inferences of hybridisation based on population
structure patterns and multi-population coalescent analyses highlight
the evolutionary complexity of Hogna species from Madeira and
underscore the need for careful interpretation of genomic data.