Assembling the reference genome de novo
We used ALLPATHS-LG version-44099(Gnerre et al., 2011)
(parameters PLOIDY=2 and PHRED_64=1) to assemble the draft genome of
the black-faced spoonbill. We ran ALLPATHS-LG on a workstation
with 32 CPUs (2,199.882 MHz) and 377.8 Gb RAM. (The peak value of in-use
memory was 342.1 Gb.)
We used the correction steps CleanCorrectedReads andErrorCorrectJump in ALLPATHS-LG to remove 1.7% of
paired-end reads and to correct 69.3% of mate-paired reads with the
criterion of low frequency k-mers (K=25 and 96 for paired-end and
mate-paired reads respectively). The raw reads for assembling were 50.5
Gb (67.8%) and 28.8 Gb (57.5%) of paired-end and mate-paired reads,
respectively (table S1). In total 34,176 contigs with N50 size of 71.0
kb were assembled with a total length of 1.18 Gb. Finally, 2,243
scaffolds (N50= 4.2 Mb) (table S1) were concatenated from these contigs.
The draft genome generated by ALLPATHS-LG is a diploid genome.
For subsequent analyses, we randomly dropped one of the nucleotides from
each heterozygous SNP to generate a pseudo-haploid reference genome.