Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Minimizing method bias for merged Reduced Representation Library and Whole Genome sequencing dataset
  • Linett Rasmussen,
  • Filipe Vieira
Linett Rasmussen
University of Copenhagen Globe Institute

Corresponding Author:[email protected]

Author Profile
Filipe Vieira
University of Copenhagen Globe Institute
Author Profile

Abstract

During the era of genomics, many sequencing methods have been (and continue to be) developed. One family of methods that has recently gotten a lot of popularity is the reduced representation library method. Yet, there has not been much research into how to deal with combining reduced representation library data with other types, like (e.g.) whole genome sequencing data. Merging of different types of data can be difficult and lead to biases in downstream data analysis. This study attempts to identify the origin of some of these biases and propose strategies to minimize them, by using a dataset of six wolves that have been sequenced both with MobiSeq and standard whole genome sequencing. Taking the whole genome sequencing data as reference, we have taken a step-by-step approach to identify parameters that minimize the bias produced by MobiSeq. Our results show that missing data has a large effect on the data analysis. Therefore, we recommend that areas for variant calling should be limited to targeted regions of the reduced representation library method, and in some analysis the addition of requiring a minimum number of individuals can minimize the bias even more.