Generating phased pseudo-haplotype sequences
In order to include multiple SNPs per UCE locus as well as invariant sites, we generated multiple sequence alignments of pseudo-haplotypes. We did this by using the EMIT_ALL_SITES output mode of the GATK HaplotypeCaller tool. We filtered the resulting VCF file to include only UCE loci with at least one SNP with no more than 10% missing data. We then generated alignments from the VCF file with a custom Ruby script,vcf2aln v0.4.2 (https://github.com/campanam/vcf2aln, Supplemental Information). This script utilizes phasing information where present and randomly selects an allele where phase is unresolved.
We trimmed aligned, phased UCE sequences with Gblocks Version 0.91b (Castresana, 2000) using default parameters and quantified informative sites with the phyluce_align_get_informative_sites.py PHYLUCE script. For the final dataset, we retained loci with more than one and fewer than 10 parsimony informative sites (PIS). We removed loci with more than 10 PIS because these are likely the result of assembly or alignment errors.