Genome sequencing, assembly, and post-processing
We generated 11 whole-genome sequences representing both New World
marten species, including individuals collected in both known hybrid
zones (Kuiu [KUI] Island and the northern Rocky Mountains
[MTX]), two islands that received translocations of M.
americana (Prince of Wales Island, POW; Chichagof Island, CHI), and an
Old World sable (Martes zibellina ), as an outgroup (Table 1).
Sequences were generated on an Illumina HiSeq X through the Beijing
Genomics Institute (BGI Americas, Philadelphia, PA, USA) and NextSeq 500
through the Molecular Biology Facility at the University of New Mexico.
Sampling was based on previous single and multi-locus genetic (Dawsonet al. 2017; Colella et al. 2018a) and morphological
analyses (Colella et al. 2018b) that defined species limits and
hybrid zone locations through the identification of mixed mitochondrial
and nuclear haplotypes. Liver tissue subsamples were loaned from the
University of New Mexico’s Museum of Southwestern Biology (MSB) and the
Burke Museum at the University of Washington (UWBM). DNA extractions
followed a DNeasy Blood and Tissue Kit (Qiagen, Venlo, The Netherlands)
protocol. Our assembly pipeline followed Colella et al. (2018c).
Read quality was examined using FastQC (Andrews 2010) and adapter
sequences and sex chromosomes removed by excluding those scaffolds from
the reference (Trimmomatic v0.33; Bolger et al. 2014). The
Burrows-Wheeler aligner (BWA , Li & Durbin 2010) was used to map
reads to the domestic ferret genome (Mustela putorius furo ; Penget al. 2014) and an additional BWA iteration extracted
mitochondrial genomes using the same reference. Final depth of coverage
ranged from 19 to 30X (Table 1). PCR duplicates were removed usingPicard v1.9 (MarkDuplicates ;
http://broadinstitute.github.io/picard/) and nuclear and mitochondrial
consensus sequences called using SAMtools (mpileup ; Liet al. 2009). Single nucleotide polymorphisms (SNPs) were called
with the Genomic Analysis Toolkit (GATK ,Haplotypecaller ; McKenna et al. 2010) for all North
American marten and again against the M. zibellina outgroup. SNPs
were filtered (Supplemental Information 1) by minimum depth (minDP = 2,
set to 1/3rd the coverage of our lowest coverage
sample, as recommended for PSMC analyses; Li & Durbin 2011),
genotype quality (minGQ = 30), minimum minor allele frequency (MAF =
0.1), and scaffold size (1Mb). Private alleles and indels were removed
using VCFtools (Danecek et al. 2011). A MAF of 0.1 removed
singletons (e.g., individual-specific, rare mutations), which are not
informative about allelic overlap among populations, to reduced
inclusion of potential sequencing errors more common in lower coverage
genomes. Format conversions (vcf, ped, bed) were conducted inPLINK (Purcell et al. 2007). Missing data were removed
(–max-missing, VCFtools ) based on analysis specifications.
Variants were spaced (1 per 100bp window) to account for linkage
disequilibrium and sorted into 46 ‘pseudo-chromosomes’ to enable the
application of human-specific analyses to a non-model system with only
38 chromosomes using custom python scripts available at
https://github.com/jpcolella/.