Keywords
Genomic data, hybridisation, phylogenetic discordance, species boundaries, island evolution
Introduction
Understanding the mechanisms driving the diversity of life on Earth is a fundamental question in biology, with speciation being the critical driving process (Hernández-Hernández et al., 2021; Jia et al., 2021; Seehausen et al., 2014). Speciation is not a single event but a continuum, ranging from minimally differentiated to entirely distinct species. This continuum is often reflected in the genetic diversity within and between populations, which can be further modulated by processes such as incomplete lineage sorting (ILS) and hybridisation. Systematic studies integrating phenotypic and molecular data are essential for placing species within an evolutionary framework. However, the gradual divergence of species through time and processes such as ILS or gene flow can result in genetic and phenotypic variation gradients that challenge the reconstruction of accurate phylogenies and blur species boundaries.
Along with evolutionary processes, methodological and analytical decisions (e.g., data type and quality and probabilistic framework) may also contribute to phylogenetic uncertainty and fail to establish species limits (Steenwyk et al., 2023). When referring to genomic data, one of the main challenges in unravelling recently diverged evolutionary lineages is distinguishing similarity due to gene flow from that of shared ancestry. Detecting hybrid zones, especially in geographically constrained areas like islands, often leads to debates about whether observed genetic patterns reflect ongoing gene flow or result from historical separation followed by secondary contact (Gompert & Buerkle, 2016). Overestimating gene flow can lead to erroneous conclusions regarding the limits and evolutionary relationships among species, whereas underestimating it can conceal the role of hybridisation in species diversification. This issue is further complicated in cases of ILS, where ancestral genetic polymorphisms persist across multiple species, leading to phylogenetic discordance among loci (He et al., 2023; Hernández-Gutiérrez et al., 2022; Herrig et al., 2024). Detecting and correctly interpreting these signals is vital for reconstructing and understanding the Tree of Life, as well as for assessing biodiversity and conservation priorities.
Spiders are among the most diverse animals on Earth and play a fundamental role as generalist predators in shaping terrestrial ecosystems (Nyffeler & Birkhofer, 2017; Wheeler et al., 2017). As in many other organisms, hybridisation has been documented across multiple species, highlighting its potential role in driving diversification and evolutionary processes. For instance, hybrid zones have been identified among funnel weaver spiders as the Eratigena atrica (C.L. Koch, 1843) group in Europe (Croucher et al., 2007; Oxford & Bolzern, 2018); in deeply divergent lineages of the red devil spider genus Harpactocrates in the Pyrenees (Bidegaray-Batista et al., 2016), or in Iberian representatives of the cobweb spider genusTheridion (Domènech et al., 2020) . Conversely, recent research has demonstrated that multiple evolutionary processes, such as ancestral hybridisation events and ILS, have contributed to the emergence of different ecomorphs in the Hawaiian spiny-leg Tetragnatha spiders (Cerca, Cotoras, Santander, et al., 2023).
The wolf spider genus Hogna in the Galapagos Islands exemplifies the significance of gene flow in evolutionary diversification, demonstrating that high levels of intra-island introgression have likely led to the parallel evolution of habitat specialisation across the archipelago (De Busschere et al., 2015). The worldwide distributed, species-rich genus Hogna Simon, 1885, includes 236 species (World Spider Catalog, 2024) of medium- to large-sized spiders. This richness is the result of the usage of the genus as a dump basket for Lycosinae species with conserved morphology, mostly because species identification has been plagued by the absence of distinctive diagnostic characters and low-quality illustration (Logunov, 2020), thus resulting in a polyphyletic genus (Crespo et al., 2022; Piacentini & Ramírez, 2019). The genus has managed to disperse and diversify in several archipelagos worldwide, including the Mediterranean, the Caribbean, the Atlantic, and the Pacific (Gloor et al., 2017). Seven endemic species of the wolf spider genus Hogna have been documented in the northeast Atlantic archipelago of Madeira, namelyHogna maderiana (Walckenaer, 1837), H. ingens (Blackwall, 1857), H. blackwalli (Johnson, 1863), H. heeri (Thorell, 1875), H. insularum (Kulczynski, 1899), H. nonannulata Wunderlich, 1995 and H. isambertoi Crespo, 2022. A recent study employing molecular and phenotypic data highlighted the importance of integrative methods in resolving species delimitation and understanding the endemic species’ biogeographic connections from the Madeira archipelago (Crespo et al., 2022). However, the study failed to draw the boundaries and phylogenetic relationships of two endemic sibling species, namelyH. insularum , which inhabits all the islands, and the large and colourful H. maderiana , circumscribed to Porto Santo and nearby islets, where the two species overlap. This pair of species defied classification due to the presence of intermediate phenotypes and the lack of mitochondrial and nuclear sorting, despite significant divergences in the DNA barcode gene (cytochrome c oxidase subunit I,COI ) (Crespo et al., 2022).
In this study, we aim to resolve the taxonomic and evolutionary conundrum posed by the Madeiran endemic species pair H. maderiana and H. insularum , by implementing a genome-wide approach using double digest RADseq (ddRADseq). Specifically, we seek to determine whether these two nominal species represent a case of hybridisation due to gene flow on the Porto Santo Island or if they represent a single species exhibiting extreme morphological polymorphism. To achieve this goal, we used genome-wide data to analyse 61 specimens of H. maderiana and H. insularum sampled across their known distribution. Our results shed light on their taxonomic status, help to understand the historical events leading to the observed genetic pattern and evaluate the presence of genetic discordance across populations due to incomplete lineage sorting and/or ongoing gene flow.
Material and Methods
Sampling.
We sampled 61 specimens from the previous study of Crespo et al. (2022), representing the known range of the focal species in the Madeira Islands. These specimens were distributed as follows: 27 H. insularum , 32 H. maderiana, and two unidentified immatures. They included eight Hogna insularum from Deserta Grande, seven from Madeira, three from Bugio, nine from Porto Santo, and nearby islets (three from Ilhéu de Cima and one from Ilhéu de Ferro). Additionally, we collected 32 H. maderiana from Porto Santo (eight from Ilhéu de Ferro). Finally, we included two immatures from Porto Santo that could not be unambiguously assigned to any of the two species (Figure 1). We also included one H. nonannulata specimen from Madeira, seven H. ingens specimens from Deserta Grande, and one from Madeira as outgroups (Table S1).
Haplotype network.
To analyse and visualise the relationships among mitochondrial DNA sequences within H. maderiana and H. insularum, we used the 93 COI sequences of 676 base pairs (bp) available from the previous study (Crespo et al., 2022) (Table S1). We estimated the haplotype network for the COI matrix using the randomised minimum spanning tree method (Paradis, 2018). We analyse and visualise the data with the help of the R packages ape (Paradis & Schliep, 2019) and pegas (Paradis, 2010).
ddRADseq library preparation and sequencing
For the double-digest restriction-site associated DNA (ddRADseq) library preparation, we used Speedtools Tissue DNA Extraction Kit (Biotools, Madrid, Spain) or DNeasy Blood & Tissue Kit, (Qiagen, Valencia, CA, USA) to extract and purify total genomic DNA, with minor modifications in the cell lysis time (overnight incubation). We estimated the amount of gDNA with Qubit dsDNA HS assay (Invitrogen, Carlsbad, CA, USA). Genomic DNA was individually barcoded and processed in-house using the ddRADseq procedure (Peterson et al., 2012). For ddRADseq library preparation: 300 ng of genomic DNA was digested in a reaction consisting of 0.5 μL high-fidelity restriction enzymes MseI (10,000 units ml-1, New England Biolabs, UK) , 0.5 μL EcoRI-HF (20,000 units ml-1, New England Biolabs) and 2 μl of CutSmart Buffer (New England Biolabs). It was then incubated for 4 hours at 37°C and enzymes were deactivated at 4ºC. We performed a purification step with AMPure XP 1.6X magnetic beads (Beckman Coulter, Inc., Brea, CA, USA) with a final elution in 40 μl. We measured the concentration of purified and digested DNA with Qubit dsDNA HS assay (Invitrogen) and used this value to normalise the DNA. We pooled them into three libraries of 24 samples each based on concentration measurements (Table S1). The resulting fragments were ligated to custom-made 24 P1 and one P2 adapters containing sample-specific 7-base-pair barcodes and primer annealing sites (Table S2). We added the listed buffers and enzymes in every sample for the ligation step: 4 μl T4 DNA Ligase Buffer (New England Biolabs), 1.5 μl T4 DNA Ligase (2,000,000 units ml-1, New England Biolabs), 2 μl P1 adapter (1μM), 2 μl P2 adapter (4μM), 0.5 μl water and 30 μl of DNA digested-normalised. The ligation process was performed for 2 h at 23°C, enzymes were deactivated at 65°C for 10 min and the temperature decreased by 2°C per 90 seconds until reaching 23°C. Then, we pooled all the ligation products in three tubes and purified them with AMPure XP 1.5X magnetic beads (Beckman Coulter), eluting in 31 μl. We size-selected the pools or libraries at 300 bp (range 150–400 bp) with BluePippin (Sage Science, Beverly, MA, USA) using the cassette type “2% DF Marker V1” and the “tight” option. Finally, we amplified each library using the Phusion High-Fidelity PCR Kit (New England Biolabs,) and we conducted PCRs with an initial denaturation for 30 s at 98 °C, 10 cycles (10 s at 98 °C, 30 s at 65 °C, 30 s at 72 °C) and a final extension for 10 min at 72 °C. We cleaned and size-selected the resulting libraries to remove possible remaining adapters and primers using AMPure beads 0.8X (Beckman Coulter). We pooled the three libraries at equal concentrations and paired-end sequenced on an Illumina NovaSeq PE150 at Novogene Co.
ddRADseq locus assembly, filtering, and outlier detection
We demultiplexed and assembled paired-end reads into de novo loci using the ipyrad 0.9.83 (Eaton & Overcast, 2020). Raw reads were quality-trimmed to eliminate low-quality sequences, reads with uncalled bases, and reads without complete barcode or restriction cut site. We used the following parameters in ipyrad: “pairddrad” data type for double-digest RADseq; AATTC and TAA for the restriction overhang. We used default parameters for base quality (5), minimum (6), and maximum depth of reads in stacks (10,000). Reads retained were further quality-filtered to convert base calls with a Phred score below 20 into Ns and discard reads with more than two Ns. Within sample clustering, we used the 0.85 threshold. We set the maximum number of allowable barcode mismatches to 0 to ensure that each read was correctly assigned to its identifying barcode. Filtering for adapters/primers was set to 2, the strictest setting, to remove any barcodes that may have been left over from the demultiplexing and could interfere with downstream clustering. We set the minimum length of reads to 35 and the maximum alleles per site in the consensus sequences to two. We used the default settings for the maximum number of uncalled bases and heterozygotes in consensus sequences (0.05). Finally, we used default settings for the maximum number of single nucleotide polymorphisms (SNPs) per locus (0.2) and indels per locus (8). We assembled two different datasets, each differing in the number of species: dataset a) all specimens from the fourHogna species that were part of the study and dataset b) only specimens from H. insularum and H. maderiana .
Population genomic structure and genetic diversity
To examine the genetic structure among the specimens in dataset b), we performed a discriminant analysis of principal components (DAPC) (Table S1). The DAPC method combines discriminant and principal component analysis to maximise the genetic variability between groups and minimise the variability within each group (Jombart et al., 2010). The study was performed with the R package adegenet 2.1.10 (Jombart & Ahmed, 2011). We used the “xvalDapc” function to find the optimal number of principal components (PCs) to be retained and “find.clusters” to find the optimal number of clusters based on the Bayesian information criterion (BIC). The acquired values were utilised to do a DAPC on the data using the “dapc” function. We employed the “scatter” function to display the graphs.
To infer the population genetic structure and detect potential admixture among groups, we used structure 2.3 (Pritchard et al., 2000) under the admixture model based on one SNP per locus, for dataset b) (Table S1). We predefined the number of genetic clusters (K) from one to five. We conducted ten replicates for each K, executing 200,000 iterations and a burn-in of 10,000. Graphs from the 10 runs at each value of K were aggregated using clumpak (Kopelman et al., 2015) and the probable number of ancestral clusters was selected based on ΔK and Pritchard from Structure Harvester 0.6.94 (Earl & vonHoldt, 2012; Evanno et al., 2005).
In addition, we employed fineradstructure (Malinsky et al., 2018), a model-based method developed explicitly for evaluating extensive SNP datasets, to assess the genetic structure patterns while identifying the sources of ancestry in each population. This software provides high resolution for inferring recent shared ancestry by utilising haplotype linkage information and prioritising the most recent common ancestry among the studied individuals. It generates a co-ancestry matrix, summarising the dataset’s closest neighbour haplotype relationships. More precisely, we employed the script finerad_input.py, which is part of the fineRADstructure-tools package available at https://github.com/edgardomortiz/fineRADstructure-tools, to transform the ”.alleles” output from ipyrad into the appropriate input format for fineRADstructure. As part of the conversion process, we specifically filtered the dataset to include only loci that were not linked (using the default parameter) and had a minimum sample size of eight (–minsample 8). Individuals included in the analyses are summarized in Table S1. Subsequently, we created the co-ancestry matrix using the RADpainter module of fineradstructure. We employed the default parameters of 100,000 Markov chain Monte Carlo (MCMC) iterations, with a burn-in of 100,000 iterations and sampling taking place every 1000 iterations, to conduct the analysis. A tree was built using 10,000 iterations of the hill-climbing algorithm. We visualised the results using the scripts fineradstructureplot.r and finestructurelimrary.r, which may be accessed at http://cichlid.gurdon.cam.ac.uk/fineRADstructure.html.
Since we identified a genetic group with a signal of putative admixture, we investigated whether there is evidence of potential gene flow or incomplete lineage sorting (ILS) between the different genetic groups. To achieve this, we used Patterson’s D-statistics, also known as the abba-baba test (Durand et al., 2011), implemented in dsuite 0.1 (Malinsky et al., 2021). The D-statistics analyses correlations in allele frequencies across closely related species since they share a significant amount of genetic variation due to common ancestry and ILS. This algorithm computes the D-statistic by considering a 4-taxon fixed phylogeny: (((P1, P2), P3), O) where O is an outgroup with the ancestral allele “A”. The “A” allele and derived allele “B” should follow the pattern BBAA (P1 and P2 share the allele “B”); the pattern ABBA refers to P2 and P3 sharing the derived allele “B” and BABA refers to P1 and P3 sharing the derived allele. The allele patterns ABBA and BABA, which differ from the species tree pattern BBAA, can provide evidence of introgression when there is a disproportion between the two patterns, usually an excess of them. ILS is the most plausible explanation if these two patterns occur in equal frequency. We performed the four-taxon D-statistic test on two different comparisons with dataset a: a) taking into account the three genetic groups detected by the genetic structure analyses (H. insularum from Madeira, Desertas, and Bugio, H. insularum from Porto Santo and H. maderiana) and H. ingens as outgroup; b) considering the three genetic groups across the different islands, with H. ingens as outgroup (Table S1).
Demographic analyses
Based on the genetic structure analysis results, we conducted a demographic modelling analysis to further elucidate between incomplete lineage sorting or introgression as a mechanism to explain the recovered evolutionary patterns within the Hogna species complex. We used two approaches based on the unfolded site frequency spectrum (SFS). On the one hand, using fastsimcoal2 (fsc28) (Marchi et al., 2024), we tested five demographic scenarios based on the multidimensional joint site frequency spectrum (SFS). We used easySFS (https://github.com/isaacovercast/easySFS) to generate a folded multidimensional SFS. We downsampled our dataset to a smaller number: four H. maderiana individuals, four H. insularum from Madeira, and three H. insularum from Porto Santo. Briefly, we simulated five demographic models (Figure S1). We first considered two scenarios without gene flow between lineages, and hence that admixture was due to ancestral polymorphism: model A) whereH. insularum split into two lineages, one from Porto Santo and one from Madeira; and model B) where H. maderiana and H. insularum from Porto Santo shared a common ancestor. The following scenario (model C) considered recent gene flow between the two species from Porto Santo (H. insularum and H. maderiana ). The fourth scenario (model D) considered that H. insularum from Porto Santo resulted from secondary contact between H. maderiana andH. insularum , with no subsequent gene flow. Finally, we tested the same secondary contact scenario but with subsequent gene flow between the three lineages (model E). We fixed the effective population size (Ne ) for H. maderiana to enable the estimation of other parameters in fastsimcoal2. We calculated Ne fixed in the model from the level of nucleotide diversity (π) and estimates of mutation rate per site per generation (μ, becauseN e = (π/4μ). We estimated π for H. maderiana from polymorphic and non-polymorphic loci using DnaSP (π = 0.0048). We assumed a mutation rate per site per generation of 6 × 10-9 as estimated forDrosophila melanogaster (Wang et al., 2023). All template files (*.tpl) that contain parameters and estimation files (*.est) that contain unknown parameters for estimation were provided in Supplementary Material. We ran 100 independent simulations of each model in fastsimcoal2. Each run comprised 100,000 coalescent simulations, 50 expectation–conditional maximisation (ECM) cycles, and a proportion of the initial search in the last cycle of 0.5 (-z0.5). We used Akaike’s Information Criterion (AIC) to determine the probability of each model given the observed data and select the best model. Finally, we calculated each parameter estimate’s confidence intervals (CI) from the best-supported model by simulating 20 replicates of 100 SFS and re-estimating the parameters each time. We obtained the point estimates of each demographic parameter for the best-supported model from the run with the highest composite maximum likelihood score.
On the other hand, we implemented a model-flexible approach using stairway plot2 (Liu & Fu, 2015) to infer the demographic history of each lineage separately and to understand the events that can explain the observed pattern in fastsimcoal2. We assumed a generation time of 1 year. Median estimates of Ne and confidence intervals were estimated with the built-in bootstrap function using 200 subsets of the input data, ignoring singletons. We graphically represented the results with ggplot2.
Species tree under the multispecies coalescent model
Finally, we used the Bayesian multispecies coalescent framework of snapp 2.7.3 (Bryant et al., 2012) as implemented in beast 2 2.7.3 (Bouckaert et al., 2014) to estimate a species tree and corroborate the presence of incomplete lineage sorting. Given that SNAPP does not consider gene flow or introgression in its coalescent process modelling, the species tree generated would reflect the most probable evolutionary relationships among species, assuming all discrepancies between gene trees and the species tree are attributable to ILS. The dataset consisted of 696 unlinked single nucleotide polymorphisms that were converted into a biallelic format. These SNPs were derived from the ingroup specimens, which comprised 50 specimens from Hogna maderiana and Hogna insularum . The individuals were categorised into three genetic groups: a) H. maderiana , b) H. insularum from Madeira, Desertas, and Bugio and c) H. insularum from Porto Santo (Table S1), based on the results of the genomic structure analyses. The .usnps file generated by ipyrad was modified and transformed into a snapp input file. The model was defined with specific parameter values: the mutation rates u and v were fixed at 1 and not subject to sampling, the coalescent rate was set to 10 and subject to sampling, non-polymorphic sites were excluded, and a log-likelihood correction was applied. snapp utilises a Yule prior, where the parameter lambda represents the rates at which new species originate. Prior probabilities: alpha = 1, beta = 250, gamma = 1, lambda = 0.01, no samples taken. We conducted four separate iterations, each consisting of a chain length of 100,000 generations. In each iteration, we collected samples at intervals of 100 generations. We assessed convergence by ensuring the effective sample size (ESS) was greater than 200. We determined that the burn-in period was 10% using tracer 1.7.2. Additionally, we analysed the posterior distribution of trees using densitree 3.0.2. We utilised LogCombiner to merge the four runs and generated a maximum clade credibility tree (MCC) using treeannotator 2.7.3. Finally, we visualised the tree using itol 6.8 (Letunic & Bork, 2024).
Results
Haplotype network
The 93 COI sequences yielded 51 haplotypes (Table S3, Figure S2). Among these haplotypes, 35 were singletons. A taxonomic segregation into the two species was not readily apparent in the haplotype network. In several instances, haplotypes of one species were more closely connected to the ones of the other species. Haplotypes of H. insularum from Madeira formed an exclusive cluster. They shared the same ancestral relative that most specimens sampled from Desertas and the closeby Bugio islet, except for two distantly related haplotypes more closely associated with Porto Santo ones. Haplotype IX was the most abundant (12 out of 93 individuals), present in Porto Santo and Ilhéu de Ferro, and restricted to H. maderiana . The second most abundant haplotype (XXXI) was exclusive to H. insularum from Madeira, Desertas, and Bugio (Table S3, Figure S2).
SNP ddRADseq data
Sequencing of the ddRADseq library resulted in 164,606,284 paired-end (PE) reads that contained a maximum of a single error per 1000 bases, with a mean individual sample coverage of 1,130,758.37 ± 474,932.71 (Table S4). The initial matrix with the four species included 70 individuals (4034 SNPs, 497 unlinked SNPs, and 67.25% of missing data). Still, after eliminating 11 individuals based on a low number of reads (below 750,000) or low read quality, our final ddRADseq datasets included 59 individuals, including two outgroups. After quality-trimming, we retained for the first dataset a total of 74,772,557 reads (1,267,331.47 ± 372,437.93 individual sample coverage) (Table S5), which resulted in 7158 SNPs, 803 unlinked SNPs and 67.2% missing data. For the Hogna insularum and H. maderiana dataset, we retained a total of 60,779,385 reads (1,215,587.70 ± 314,645.31) (Table S6), which resulted in 6456 SNPs, 865 unlinked SNPs, and 65,54% missing data for downstream analyse.
Population genomic structure and genomic diversity
Six PCs and two discriminant eigenvalues were retained during DAPC analysis to describe the relationship between the samples. Under the BIC criteria, three clusters were found as the optimal results. The first cluster contained Hogna maderiana individuals from Porto Santo, the second cluster gathered all Hogna insularum individuals from Madeira, Desertas and Bugio (hereafter referred as MDB), whereas the third comprised those individuals of Hogna insularum from Porto Santo (hereafter referred as PS). The different clusters were well-separated and did not overlap (Figure 2a).
structure analyses yielded an “optimal” clustering value for K=3 according to Evanno’s ΔK criterion and K=2 according to Pritchard (Figure 2b, Figure S3). The structure results for K=2 revealed two genetically distinct groups of individuals, one corresponding toH. maderiana and one to H. insularum MDB; however, individuals of H. insularum PS shared approximately 50% genetic component of each genetic group. Considering K=3, there was no distinct genetic structure, with individuals exhibiting different levels of genetic similarity with a common dominant genetic component, but a distinct additional component in individuals of H. insularum MDB. Results for K=1 to K=5 are summarised in Figure S4.
The fineradstructure analysis was mainly in agreement with the structure and DAPC results. The dendogram and coancestry matrix resulting from fineradstructure resolved two major groupings in the data, corresponding to H. insularum and H. maderiana (Figure 2c). Within the H. insularum group, individuals clustered in two supported subclusters belonging to the individuals from Porto Santo on one side and the individuals from the remaining islands on the other. Overall, the co-ancestry level was high within the three groups, higher in H. insularum MDB. Slightly higher average values were observed between the H. insularum groups compared to the coancestry levels between H. insularum PS and H. maderiana . However, the coancestry values remain relatively uniform and comparable across the species.
The abba-baba tests supported an incomplete lineage sorting (ILS) scenario across the three genetic groups. No significant evidence of gene flow was detected in any comparisons among H. insularum PS, H. insularum MDB, and H. maderiana , as indicated by the non-significant p-values and relatively low Z-scores (Table S7). On the other hand, although not directly relevant to the objective of our analysis, we observed signals of gene flow between populations ofH. insularum from the Madeira and Bugio Islands. Detailed results of the abba-baba tests are presented in Table S7.
Coalescent-based demographic model
The fastsimcoal2 analysis identified the model of three independent lineages (H. insularum MDB, H. insularum PS, and H. maderiana ) with neither past admixture nor contemporary gene flow as the one with the highest support (model A; see Table S8 and Figure 3a). According to this model, the lineage of H. insularum PS would constitute a distinct species closely related to H. insularum MDB. Assuming a 1-year generation time, fastsimcoal2 estimated that lineages diverged from a shared ancestor approximately divergence between the two H. insularum lineages occurred around estimated Ne for H. insularum from Porto Santo wasH. insularum from Madeira, Desertas, and Bugio, it was presentation of all estimated demographic variables and their weighted averages can be found in Figure 3a and Table S9.
We repeated the demographic analyses using stairway plot, revealing a consistent pattern in the two lineages inhabiting Porto Santo Island. Both lineages exhibited a stepwise increase in effective population size, eventually stabilising at a constant level. On the other hand, in H. insularum MDB, we identified a rapid population increase followed by a more recent stabilisation (Figure 3b).
Species tree under the multispecies coalescent model
snapp inferred three incongruent species tree topologies from the posterior distribution. The species tree topology with the highest probability (45.93%) indicated a sister group relationship between the lineages from H. maderiana and H. insularum PS. The following topology (35.53%) recovered the H. insularum PS and MDB as sister groups. The third and least supported topology (21.54% of the gene trees) recovered H. maderiana as sister to H. insularum MDB (Figure 4). The consensus tree yielded a low support level (posterior probability of 0.47) for the clade of the two Porto Santo lineages (Figure S5).
Discussion
Genomic data has become an invaluable tool for unveiling complex evolutionary scenarios, providing insights into hidden aspects of population history and recent species divergence. However, current practice in population genomic studies fail to provide evidence on the full evolutionary history of a species and, therefore, achieve incorrect conclusions. Our study focused on two endemic Hogna species from the Madeira archipelago, exhibiting mitochondrial discordance and fuzzy morphological limits. We used ddRADseq data, integrating population genomic structure analyses and coalescent-based demographic modelling to understand the contribution of incomplete lineage sorting (ILS) and gene flow to the observed genetic patterns. Our genetic structure analyses suggested three potential genetic clusters, one for each nominal species and one compatible with hybridisation between the two species on Porto Santo Island. However, our demographic models and the D-statistic test explicitly rejected gene flow between the two lineages. Instead, they supported a third independent Hogna lineage on Porto Santo Island. The apparent absence of clear-cut limits between the lineages is likely explained by unsorted polymorphism, probably due to the recent divergence times, as some of our results suggest. The contrasting results between inferences of hybridisation based on population structure patterns and multi-population coalescent analyses highlight the evolutionary complexity of Hogna species from Madeira and underscore the need for careful interpretation of genomic data.