Molecular Ecology Resources - 21DOCS Test Area

by author

by title

by keyword

Creating, curating, and evaluating a mitogenomic reference database to improve region...

Emily Dziedzic

and 9 more

February 27, 2023

Species detection using eDNA is revolutionizing global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of mitogenomic sequence information to evaluate the effectiveness of such data to detect and identify taxa. We created the Oregon Biodiversity Genome Project to create a database of complete, nearly error-free mitogenomic sequences for all of Oregon’s fishes. We have successfully assembled the complete mitogenomes of 313 specimens of freshwater, anadromous, and estuarine fishes representing 24 families, 55 genera, and 128 species and lineages. Comparative analyses of these sequences illustrate that many regions of the mitogenome are taxonomically informative, that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays do not consistently diagnose for species, and that complete single or multiple genes of the mitogenome are preferable for identifying Oregon’s fishes. This project provides a blueprint for other researchers to follow as they build regional databases, illustrates the taxonomic value and limits of complete mitogenomic sequences, and offers clues as to how current eDNA assays and environmental genomics methods of the future can best leverage this information.

Morphological and taxonomic diversity of mesozooplankton is an important driver of ca...

Margaux Perhirin

and 5 more

February 10, 2023

Mesozooplankton is a very diverse group of small animals ranging in size from 0.2 to 20 mm not able to swim against ocean currents. It is a key component of pelagic ecosystems through its roles in the trophic networks and the biological carbon pump. Traditionally studied through microscopes, recent methods have been however developed to rapidly acquire large amounts of data (morphological, molecular) at the individual scale, making it possible to study mesozooplankton using a trait-based approach. Here, combining quantitative imaging with metabarcoding time-series data obtained in the Sargasso Sea at the Bermuda Atlantic Time-series Study (BATS) site, we showed that organisms’ transparency might be an important trait to also consider regarding mesozooplankton impact on carbon export, contrary to the common assumption that just size is the master trait directing most mesozooplankton-linked processes. Three distinct communities were defined based on taxonomic composition, and succeeded one another throughout the study period, with changing levels of transparency among the community. A co-occurrences’ network was built from metabarcoding data revealing six groups of taxa. These were related to changes in the functioning of the ecosystem and/or in the community’s morphology. The importance of Diel Vertical Migration at BATS was confirmed by the existence of a group made of taxa known to be strong migrators. Finally, we assessed if metabarcoding can provide a quantitative approach to biomass and/or abundance of certain taxa. Knowing more about mesozooplankton diversity and its impact on ecosystem functioning would allow to better represent them in biogeochemical models.

The genome sequence of Samia ricini, a new model species of lepidopteran insect

Jung Lee

and 7 more

April 27, 2020

Samia ricini, a gigantic saturniid moth, has the potential to be a novel lepidopteran model species. Since S. ricini is much more tough and resistant to diseases than the current model species Bombyx mori, the former can be easily reared compared to the latter. In addition, genetic resources available for S. ricini rival or even exceed those for B. mori: at least 26 eco-races of S. ricini are reported and S. ricini can hybridise with wild Samia species, which are distributed throughout Asian countries, and produce fertile progenies. Physiological traits such as food preference, integument colour, larval spot pattern, etc. are different between S. ricini and wild Samia species so that those traits can be the target for forward genetic analysis. In order to facilitate genetic research in S. ricini, we determined the whole genome sequence of S. ricini. The assembled genome of S. ricini was 458 Mb with 155 scaffolds, and the N50 length of the assembly was approximately 21 Mb. 16,702 protein coding genes were predicted in the assembly. Although the gene repertoire of S. ricini was not so different from that of B. mori, some genes, such as chorion genes and fibroin genes, seemed to have specifically evolved in S. ricini.

A decade of de novo transcriptome assembly: Are we there yet?

Martin Hölzer

September 11, 2020

A decade ago, de novo transcriptome assembly evolved as a versatile and powerful approach to make evolutionary assumptions, analyze gene expression, and annotate novel transcripts, in particular, for non-model organisms lacking an appropriate reference genome. Various tools have been developed to generate a transcriptome assembly, and even more computational methods depend on the results of these tools for further downstream analyses. In this issue of Molecular Ecology Resources, Freedman et al. (2020) present a comprehensive analysis of errors in de novo transcriptome assemblies across public data sets and different assembly methods. They focus on two implicit assumptions that are often violated: First, the assembly presents an unbiased view of the transcriptome. Second, the expression estimates derived from the assembly are reasonable, albeit noisy, approximations of the relative frequency of expressed transcripts. They show that appropriate filtering can reduce this bias but can also lead to the loss of a reasonable number of highly expressed transcripts. Thus, to partly alleviate the noise in expression estimates, they propose a new normalization method called length-rescaled CPM. Remarkably, the authors found considerable distortions at the nucleotide level, which leads to an underestimation of diversity in transcriptome assemblies. The study by Freedman et al. clearly shows that we have not yet reached “high-quality” in the field of transcriptome assembly. Above all, it helps researchers be aware of these problems and filter and interpret their transcriptome assembly data appropriately and with caution.

Long- and short-read metabarcoding technologies reveal similar spatio-temporal struct...

Brendan Furneaux

and 4 more

October 22, 2020

Fungi form diverse communities and play essential roles in many terrestrial ecosystems, yet there are methodological challenges in taxonomic and phylogenetic placement of fungi from environmental sequences. To address such challenges we investigated spatio-temporal structure of a fungal community using soil metabarcoding with four different sequencing strategies: short amplicon sequencing of the ITS2 region (300–400\ bp) with Illumina MiSeq, Ion Torrent Ion S5, and PacBio RS II, all from the same PCR library, as well as long amplicon sequencing of the full ITS and partial LSU regions (1200–1600\ bp) with PacBio RS II. Resulting community structure and diversity depended more on statistical method than sequencing technology. The use of long-amplicon sequencing enables construction of a phylogenetic tree from metabarcoding reads, which facilitates taxonomic identification of sequences. However, long reads present issues for denoising algorithms in diverse communities. We present a solution that splits the reads into shorter homologous regions prior to denoising, and then reconstructs the full denoised reads. In the choice between short and long amplicons, we suggest a hybrid approach using short amplicons for sampling breadth and depth, and long amplicons to characterize the local species pool for improved identification and phylogenetic analyses.

Log-ratio analysis of microbiome data with many zeroes is library size dependent

Dennis te Beest

and 3 more

November 30, 2020

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artifact of the sequencing platform and as a result such data are compositional. To avoid library size dependency, one common way of analyzing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centered log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this paper we show on real data and by simulation that, applied to data that combines these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.

Beyond taxonomy: Validating functional inference approaches in the context of fish-fa...

Olivier Laroche

and 3 more

October 12, 2020

Characterization of microbial assemblages via environmental DNA metabarcoding is increasingly being used in routine monitoring programs due to its sensitivity and cost-effectiveness. Several programs have been developed recently which infer functional profiles from 16S rRNA gene data using hidden-state prediction (HSP) algorithms. These might offer an economic and scalable alter-native to shotgun metagenomics. To date, HSP-based methods have seen limited use for benthic marine surveys and their performance in these environments remains unevaluated. In this study, 16S rRNA metabarcoding was applied to sediment samples collected at 0 and ≥ 1200 m from Norwegian salmon farms, and three metabolic inference approaches (PAPRICA, PICRUSt2 and TAX4FUN2) evaluated against metagenomics and environmental data. While metabarcoding and metagenomics recovered a comparable functional diversity, the taxonomic composition differed be-tween approaches, with genera richness up to 20× higher for metabarcoding. Comparisons between the sensitivity (highest true positive rates) and specificity (lowest true negative rates) of HSP-based programs in detecting functions found in metagenomics data ranged, respectively, from 0.52 and 0.60 to 0.76 and 0.79. However, little correlation was observed between the relative abundance of their specific functions. Functional beta-diversity of HSP-based data was strongly associated with that of metagenomics (r ≥ 0.86 for PAPRICA and TAX4FUN2) and responded similarly to the impact of fish farm activities. Our results demonstrate that although HSP-based metabarcoding approaches provide a slightly different functional profile than metagenomics, partly due to recovering a distinct community, they represent a cost-effective and valuable tool for characterizing and assessing the effects of fish farming on benthic ecosystems.

Disentangling adaptation from drift in bottlenecked and reintroduced populations of A...

Deborah Leigh

and 4 more

February 27, 2021

Identifying local adaptation in bottlenecked species is essential for conservation management. Selection detection methods have an important role in species management plans, assessments of adaptive capacity, and looking for responses to climate change. Yet, the allele frequency changes exploited in selection detection methods are similar to those caused by the strong neutral genetic drift expected during a bottleneck. Consequently, it is often unclear what accuracy selection detection methods have across bottlenecked populations. In this study, simulations were used to explore if signals of selection could be confidently distinguished from genetic drift across 23 bottlenecked and reintroduced populations of Alpine ibex (Capra ibex). The meticulously recorded demographic history of the Alpine ibex was used to generate comprehensive simulated SNP data. The simulated SNPs were then used to benchmark the confidence we could place in outliers identified in empirical Alpine ibex SNP data. Within the simulated dataset, the false positive rates were high for all selection detection methods but fell substantially when two or more methods were combined. True positive rates were consistently low and became negligible with increased stringency. Despite finding many outlier loci in the empirical Alpine ibex SNPs, none could be distinguished from genetic drift-driven false positives. Unfortunately, the low true positive rate also prevents the exclusion of recent local adaptation within the Alpine ibex. The baselines and stringent approach outlined here should be applied to other bottlenecked species to ensure the risk of false positive, or negative, signals of selection are accounted for in conservation management plans.

Opening a next-generation black box: ecological trends for hundreds of species-like t...

Martin Hahn

and 3 more

February 18, 2021

Current knowledge on environmental distribution and taxon richness of free-living bacteria is mainly based on cultivation-independent investigations employing 16S rRNA gene sequencing methods. Yet, 16S rRNA genes are evolutionarily rather conserved, resulting in limited taxonomic and ecological resolutions provided by this marker. We used a faster evolving protein-encoding marker to reveal ecological patterns hidden within a single OTU defined by >99% 16S rRNA sequence similarity. The studied taxon, subcluster PnecC of the genus Polynucleobacter, represents a ubiquitous group of planktonic freshwater bacteria with cosmopolitan distribution, which is very frequently detected by diversity surveys of freshwater systems. Based on genome taxonomy and a large set of genome sequences, a sequence similarity threshold for delineation of species-like taxa could be established. In total, 600 species-like taxa were detected in 99 freshwater habitats scattered across three regions representing a latitudinal range of 3400 km (42°N to 71°N) and a pH gradient of 4.2 to 8.6. Besides the unexpectedly high richness, the increased taxonomic resolution revealed structuring of Polynucleobacter communities by a couple of macroecological trends, which was previously only demonstrated for phylogenetically much broader groups of bacteria. A unexpected pattern was the almost complete compositional separation of Polynucleobacter communities of Ca2+-rich and Ca2+-poor habitats, which strongly resembled the vicariance of plant species on silicate and limestone soils. The presented new cultivation-independent approach opened a window to an incredible, previously unseen diversity, and enables investigations aiming on deeper understanding of how environmental conditions shape bacterial communities and drive evolution of free-living bacteria.

Exhaustive reanalysis of barcode sequences from public repositories highlights ongoin...

Antoine Fort

and 17 more

November 24, 2020

Sea Lettuce (Ulva spp.; Ulvophyceae, Ulvales, Ulvaceae) is an important ecological and economical entity, with a worldwide distribution and is a well-known source of near-shore blooms blighting many coastlines. Species of Ulva are frequently misidentified in public repositories, including herbaria and gene banks, making species identification based on traditional barcoding hazardous. We investigated the species distribution of 295 individual distromatic foliose strains from the North East Atlantic by traditional barcoding or next generation sequencing. We found seven distinct species, and compared our results with all worldwide Ulva spp sequences present in the NCBI database for the three barcodes rbcL, tufA and the ITS1. Our results demonstrate a large degree of species misidentification in the NCBI database. We estimate that 21% of the entries pertaining to foliose species are misannotated. In the extreme case of U. lactuca, 65% of the entries are erroneously labelled specimens of another Ulva species, typically U. fenestrata. In addition, 30% of U. rigida entries are misannotated, U. rigida being relatively rare and often misannotated U. laetevirens. Furthermore, U. armoricana and U. scandinavica present as being synonymous to U. laetevirens. An analysis of the global distribution of registered samples from foliose species also indicates possible geographical isolation for some species, and the absence of U. lactuca from Northern Europe. Altogether, exhaustive taxonomic clarification by aggregation of a library of barcode sequences highlights misannotations, and delivers an improved representation of Ulva species diversity and distribution. This approach could be easily adapted to other taxa.

Hidden diversity of Ctenophora revealed by new mitochondrial COI primers and sequence...

Lynne Christianson

and 3 more

November 30, 2021

[Definitive version of this article may be found here] The mitochondrial gene cytochrome-c-oxidase subunit 1 (COI) is useful in many taxa for phylogenetics, population genetics, metabarcoding, and rapid species identifications. However, the phylum Ctenophora (comb jellies) has historically been difficult to study due to divergent mitochondrial sequences and the corresponding inability to amplify COI with degenerate and standard COI ‘barcoding’ primers. As a result, there are very few COI sequences available for ctenophores, despite over 200 described species in the phylum. Here, we designed new primers and amplified the COI fragment from members of all major groups of ctenophores, including many undescribed species. Phylogenetic analyses of the resulting COI sequences revealed high diversity within many groups that was not evident from more conserved 18S rDNA sequences, in particular among the Lobata. The COI phylogenetic results also revealed unexpected community structure within the genus Bolinopsis, suggested new species within the genus Bathocyroe, and supported the ecological and morphological differences of some species such as Lampocteis cruentiventer and similar lobates (Lampocteis sp. ‘V’ stratified by depth, and ‘A’ differentiated by color). The newly described primers reported herein provide important tools to enable researchers to illuminate the diversity of ctenophores worldwide via quick molecular identifications, improve the ability to analyze environmental DNA by improving reference libraries and amplifications, and enable a new breadth of population genetic studies.

Arabis alpina: a perennial model plant for ecological genomics and life-history evolu...

Stefan Wötzel

and 5 more

March 01, 2021

Many model organisms have obtained a prominent status due to an advantageous combination of their life-history characteristics, genetic properties and also practical considerations. In non-crop plants, Arabidopsis thaliana is the most renowned model and has been used as study system to elucidate numerous biological processes at the molecular level. Once a complete genome sequence was available, research has markedly accelerated and further established A. thaliana as the reference to stimulate studies in other species with different biology. Within the Brassicaceae family, the arctic-alpine perennial Arabis alpina has become a model complementary to A. thaliana to study life-history evolution and ecological genomics in harsh environments. In this review, we provide an overview of the properties that facilitated the rapid emergence of A. alpina as a plant model. We summarize the evolutionary history of A. alpina, including the diversification of its mating system, and discuss recent progress in the molecular dissection of developmental traits that are related to its perennial life history and environmental adaptation. We indicate open questions from which future research might be developed in other Brassicaceae species or more distantly related plant families.

Chromosome-scale assembly and whole-genome sequencing of 266 giant panda roundworms p...

Lei Han

and 27 more

July 01, 2021

Helminth diseases have long been a threat to the health of humans and animals. Roundworms are important organisms for studying parasitic mechanisms, disease transmission and prevention. The study of parasites in the giant panda is of importance for understanding how roundworms adapt to the host. Here, we report a high-quality chromosome-scale genome of Baylisascaris schroederi with a genome size of 253.60 Mb and 19,262 predicted protein-coding genes. We found that gene families related to epidermal chitin synthesis and environmental information processes in the roundworm genome have expanded significantly. Furthermore, we demonstrated unique genes involved in essential amino acid metabolism in the B. schroederi genome, inferred to be essential for the adaptation to the giant panda-specific diet. In addition, under different deworming pressures, we found that four resistance-related genes (glc-1, nrf-6, bre-4 and ced-7) were under strong positive selection in a captive population. Finally, 23 known drug targets and 47 potential drug target proteins (essential homologues linked to lethal phenotypes) were identified. The genome provides a unique reference for inferring the early evolution of roundworms and their adaptation to the host. Population genetic analysis and drug sensitivity prediction provide insights revealing the impact of deworming history on population genetic structure of importance for disease prevention.

Understudied, underrepresented, and unknown: methodological biases that limit detecti...

Nicole Reynolds

and 3 more

April 14, 2021

Metabarcoding is an important tool for understanding fungal communities. The internal transcribed spacer (ITS) rDNA is the accepted fungal barcode but has known problems. The large subunit (LSU) rDNA has also been used to investigate fungal communities but available LSU metabarcoding primers were mostly designed to target Dikarya (Ascomycota + Basidiomycota) with little attention to early diverging fungi (EDF). However, evidence from multiple studies suggests that EDF comprise a large portion of unknown diversity in community sampling. Here we investigate how DNA marker choice and methodological biases impact recovery of EDF from environmental samples. We focused on one EDF lineage, Zoopagomycota, as an example. We evaluated three primer sets (ITS1F/ITS2, LROR/LR3, and LR3 paired with new primer LR22F) to amplify and sequence a Zoopagomycota mock community and a set of 146 environmental samples with Illumina MiSeq. We compared two taxonomy assignment methods and created an LSU reference database compatible with AMPtk software. The two taxonomy assignment methods recovered strikingly different communities of fungi and EDF. Target fragment length variation exacerbated PCR amplification biases and influenced downstream taxonomic assignments, but this effect was greater for EDF than Dikarya. To improve identification of LSU amplicons we performed phylogenetic reconstruction and illustrate the advantages of this critical tool for investigating identified and unidentified sequences. Our results suggest much of the EDF community may be missed or misidentified with “standard” metabarcoding approaches and modified techniques are needed to understand the role of these taxa in a broader ecological context.

Invertebrates for vertebrate biodiversity monitoring: comparisons using three insect...

Aimee Massey

and 9 more

April 29, 2021

Metabarcoding of environmental DNA (eDNA) is now widely used to build diversity profiles from DNA that has been shed by species into the environment. There is substantial interest in the expansion of eDNA approaches for improved detection of terrestrial vertebrates using invertebrate-derived DNA (iDNA) in which hematophagous, sarcophagous, and coprophagous invertebrates sample vertebrate blood, carrion, or feces. Here, we use metabarcoding and multiple iDNA samplers (carrion flies, sandflies, and mosquitos) to profile gamma and alpha diversity in a dry, tropical forest in the southern Amazon. Our main objectives were to (1) compare diversity found with iDNA to camera trapping, which is the conventional method of vertebrate diversity surveillance and (2) compare each of the iDNA samplers to assess the effectiveness, efficiency, and potential biases associated with each sampler. Carrion flies were the most effective sampler, despite the least amount of sampling effort and the fewest number of individuals captured for metabarcoding, in describing vertebrate biodiversity followed by sandflies. Camera traps had the highest median species richness at the site-level but showed strong bias towards carnivore and ungulate species and missed much of the diversity described by iDNA methods. Mosquitos showed a strong feeding preference for humans as did sandflies for armadillos, thus presenting potential utility to further study related to host-vector interactions.

Batch effects in population genomic studies with low-coverage whole genome sequencing...

Runyang Nicolas Lou

and 1 more

August 03, 2021

Over the past few decades, the rapid democratization of high-throughput sequencing and the growing emphasis on open science practices have resulted in an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining datasets to achieve unprecedented sample sizes, spatial coverage, or temporal replication in population genomic studies. However, a common concern is that non-biological differences between datasets may generate batch effects that can confound real biological patterns. Despite general awareness about the risk of batch effects, few studies have examined empirically how they manifest in real datasets, and it remains unclear what factors cause batch effects and how to best detect and mitigate their impact bioinformatically. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a “batch-effect-naive” bioinformatic pipeline, batch effects severely biased our genetic diversity estimates, population structure inference, and selection scan. We then demonstrate that these batch effects resulted from multiple technical differences between our datasets, including the sequencing instrument model/chemistry, read type, read length, DNA degradation level, and sequencing depth, but their impact can be detected and substantially mitigated with simple bioinformatic approaches. We conclude that combining datasets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.

Labile sex chromosomes and a novel candidate sex-determination gene in the Australian...

Alexandra Pavlova

and 7 more

July 22, 2021

Sex-specific ecology has management implications, but rapid sex-chromosome turnover in fishes hinders development of markers to sex monomorphic species. Here, we use annotated genomes and reduced-representation sequencing data for two Australian percichthyids, the Macquarie perch Macquaria australasica and the golden perch M. ambigua, and whole genome resequencing data for 50 Macquarie perch of each sex, to detect sex-linked loci, identify a candidate sex-determining gene and develop an affordable sexing assay. In-silico pool-seq tests of 1,492,004 Macquarie perch SNP loci revealed that a 275-Kb scaffold, containing the transcription factor SOX1b gene, was enriched for gametologous loci. Within this scaffold, 22 loci were sex-linked in a predominantly XY system, with females being homozygous at all 22, and males being heterozygous at two or more. Seven XY-gametologous loci were within a 146-bp region. Being ~38 Kb upstream of SOX1b, it might act as an enhancer controlling SOX1b transcription in the bipotential gonad that drives gonad differentiation. A PCR-RFLP sexing assay, targeting one of the Y-linked SNPs, tested in 66 known-sex Macquarie perch and two individuals of each sex of three confamilial species, and amplicon sequencing of 400 bp encompassing the 146-bp region, revealed that the few sex-linked positions differ between species and between Macquarie perch populations. This indicates sex-chromosome lability in Percichthyidae, also supported by non-homologous scaffolds containing sex-linked loci for Macquarie- and golden perches. The resources developed here will facilitate genomic research in Percichthyidae. Sex-linked markers will be useful for determining genetic sex in some populations and studying sex chromosome turnover.

Machine Learning models identify gene predictors of waggle dance behaviour in honeybe...

Marcell Veiner

and 3 more

September 25, 2021

The molecular characterisation of complex behaviours is a challenging task as a range of different factors are often involved to produce the observed phenotype. An established approach is to look at the overall levels of expression of brain genes – known as ‘neurogenomics’ – to select the best candidates that associate with patterns of interest. This approach has relied so far on a set of powerful statistical tools capable to provide a snapshot of the expression of many thousands of genes that are present in an organism’s genome. However, traditional neurogenomic analyses have some well-known limitations; above all, the limited number of biological replicates compared to the number of genes tested – often referred to as “curse of dimensionality”. Here we implemented a new Machine Learning (ML) approach that can be used as a complement to established methods of transcriptomic analyses. We tested three types of ML models for their performance in the identification of genes associated with honeybee waggle dance. We then intersected the results of these analyses with traditional outputs of differential gene expression analyses and identified two promising candidates for the neural regulation of the waggle dance: the G-protein coupled receptor boss and hnRNP A1, a gene involved in alternative splicing. Overall, our study demonstrates the application of Machine Learning to analyse transcriptomics data and identify genes underlying social behaviour. This approach has great potential for application to a wide range of different scenarios in evolutionary ecology, when investigating the genomic basis for complex phenotypic traits.