Reference genome assembly
We assembled a low-cost draft genome for Pogoniulus p. pusillus .
A single male individual (sample no. AR93139) was selected for assembly,
and a short-insert library was prepared by Novogene, Inc., using the
NEBNext Ultra II DNA kit (New England Biolabs). The library was
sequenced to a depth of approximately 50x on the Illumina HiSeq X
platform at Novogene Inc., with 150 bp paired-end reads. Overlapping
read pairs were collapsed and adapter and low-quality sequence removed
prior to assembly using Pear v0.9.10 (Zhang et al., 2014) with minimum
overlap size 20, minimum read length 30, quality score threshold 20, and
maximum proportion of uncalled bases 0.02. We assembled the resulting
reads with SOAPdenovo v2.04 (Luo et al., 2012) for each odd-numbered
value of k between 41 and 111, with default values for all other
parameters. The assembly for k=93 was chosen on the basis of higher
scaffold N50 and assembly length closer to the expected genome size for
birds.
We aligned these scaffolds to the zebra finch (Taeniopygia
guttata ) genome using the Nucmer command in MUMmer 4.0 (Kurtz et al.,
2004). Tinkerbird scaffolds aligning to zebra finch chromosomes were
ordered and oriented according to these alignments; scaffolds that did
not align to the zebra finch genome were ordered by scaffold size.
Single nucleotide polymorphism (SNP ) calling
Adapter sequences were removed and overlapping paired-end reads merged
in PEAR v0.9.10 (Zhang et al., 2014) and reads aligned to the P.p.
pusillus draft genome assembly in BWA MEM v0.7.17(Li, 2013) using
default parameters. Variants were called using bcftools mpileup v 1.8
(Li et al., 2009) with mapping quality > 20 and default
values for other parameters. We filtered the resulting variants using
VCFtools v 0.1.13 (Danecek et al., 2011), retaining genotypes with depth
4 or greater, and loci with minor allele frequency > 0.05
that were genotyped in at least 80% of individuals.
The SNPs of interest in this study were those most likely to explain
differences between red and yellow forecrown colour. Samtools has been
shown to perform relatively poorly in calling indels (Hwang et al.,
2015), so we manually inspected the genotype for the SNPs most strongly
associated with forecrown colour traits for any indels that may have
been incorrectly called.