loading page

Easy-to-use R functions to separate reduced-representation genomic datasets into sex-linked and autosomal loci, and conduct sex-assignment
  • +4
  • Diana Robledo-Ruiz,
  • Lana Austin,
  • J Amos,
  • Jesús Castrejón-Figureoa,
  • Michael Magrath,
  • Paul Sunnucks,
  • Alexandra Pavlova
Diana Robledo-Ruiz
Monash University

Corresponding Author:[email protected]

Author Profile
Lana Austin
Monash University School of Biological Sciences
Author Profile
J Amos
Monash University School of Biological Sciences
Author Profile
Jesús Castrejón-Figureoa
Monash University School of Biological Sciences
Author Profile
Michael Magrath
Zoos Victoria
Author Profile
Paul Sunnucks
Monash University
Author Profile
Alexandra Pavlova
Monash University
Author Profile

Abstract

Identifying sex-linked markers in genomic datasets is important, because their analyses can reveal sex-specific biology, and their presence in supposedly neutral autosomal datasets can result in incorrect estimates of genetic diversity, population structure and parentage. But detecting sex-linked loci can be challenging, and available scripts neglect some categories of sex-linked variation. Here, we present new R functions to (1) identify and separate sex-linked loci in ZW and XY sex determination systems and (2) infer the genetic sex of individuals based on these loci. Two additional functions are presented, to (3) remove loci with artefactually high heterozygosity, and (4) produce input files for parentage analysis. We test these functions on genomic data for two sexually-monomorphic bird species, including one with a neo-sex chromosome system, by comparing biological inferences made before and after removing sex-linked loci using our function. We found that standard filters, such as low read depth and call rate, failed to remove up to 28.7% of sex-linked loci. This led to (i) overestimation of population FIS by ≤ 9%, and the number of private alleles by ≤ 8%; (ii) wrongly inferring significant sex-differences in heterozygosity, (iii) obscuring genetic population structure, and (iv) inferring ~11% fewer correct parentages. We discuss how failure to remove sex-linked markers can lead to incorrect biological inferences (e.g., sex-biased dispersal and cryptic population structure) and misleading management recommendations. For reduced-representation datasets with at least 15 known-sex individuals of each sex, our functions offer convenient, easy-to-use resources to avoid this, and to sex the remaining individuals.
26 Nov 2022Submitted to Molecular Ecology Resources
02 Dec 2022Submission Checks Completed
02 Dec 2022Assigned to Editor
02 Dec 2022Review(s) Completed, Editorial Evaluation Pending
12 Dec 2022Reviewer(s) Assigned
31 Jan 2023Editorial Decision: Revise Minor
17 Mar 20231st Revision Received
29 Mar 2023Submission Checks Completed
29 Mar 2023Assigned to Editor
29 Mar 2023Review(s) Completed, Editorial Evaluation Pending
17 May 2023Reviewer(s) Assigned
04 Jul 2023Editorial Decision: Accept