Abstract
Identifying sex-linked markers in genomic datasets is important, because
their analyses can reveal sex-specific biology, and their presence in
supposedly neutral autosomal datasets can result in incorrect estimates
of genetic diversity, population structure and parentage. But detecting
sex-linked loci can be challenging, and available scripts neglect some
categories of sex-linked variation. Here, we present new R functions to
(1) identify and separate sex-linked loci in ZW and XY sex determination
systems and (2) infer the genetic sex of individuals based on these
loci. Two additional functions are presented, to (3) remove loci with
artefactually high heterozygosity, and (4) produce input files for
parentage analysis. We test these functions on genomic data for two
sexually-monomorphic bird species, including one with a neo-sex
chromosome system, by comparing biological inferences made before and
after removing sex-linked loci using our function. We found that
standard filters, such as low read depth and call rate, failed to remove
up to 28.7% of sex-linked loci. This led to (i) overestimation of
population FIS by ≤ 9%, and the number of private alleles by ≤ 8%;
(ii) wrongly inferring significant sex-differences in heterozygosity,
(iii) obscuring genetic population structure, and (iv) inferring
~11% fewer correct parentages. We discuss how failure
to remove sex-linked markers can lead to incorrect biological inferences
(e.g., sex-biased dispersal and cryptic population structure) and
misleading management recommendations. For reduced-representation
datasets with at least 15 known-sex individuals of each sex, our
functions offer convenient, easy-to-use resources to avoid this, and to
sex the remaining individuals.