Genome-wide heterozygosity
We estimated genome-wide heterozygosities using information of a single
individual from all eight Procellariiformes species studied. We applied
the Robinson et al., 2019 method, with minor modifications to take
genome fragmentation into consideration, since we included genome
assemblies with varying amounts of contiguity. The DNA sequence data
(genome assemblies and whole-genome sequencing data) were downloaded
from NCBI (PRJNA261828, PRJNA545868; Feng et al., 2020; Jarvis et al.,
2014 ). For each
species, adapter-trimmed reads were aligned to its genome assembly using
bwa mem (Heng Li, 2013), bam files were merged using Picard-Tools
(http://broadinstitute.github.io/picard/)
and variants were called using the GATK 4.1.9 HaplotypeCaller and
GenotypeGVCFs (McKenna et al., 2010). Sites with a coverage
<1/3X or >2X of the average coverage depth (of
the particular genome) were filtered out using VCFtools 0.1.15 (Danecek
et al., 2011). We computed per-site heterozygosity as the proportion of
heterozygous sites per total number of called genotypes within a single
individual in nonoverlapping 25Kb windows across each scaffold. Windows
with less than 50% of net sites (those excluding missing or filtered
sites), were excluded from the analysis.