Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Validation of machine learning approach for direct mutation rate estimation
  • Katarzyna Burda,
  • Mateusz Konczal
Katarzyna Burda
Adam Mickiewicz University
Author Profile
Mateusz Konczal

Corresponding Author:[email protected]

Author Profile

Abstract

Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified 3 mutations, but overall required more work and had higher rates of false positives and false negatives. We, thus, recommend its application if most of the mutations are expected to be identified or in case of experiment-specific biases. Both methods concordantly showed that guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates. Similarly, low estimates were obtained for two other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine-learning approaches.
25 Feb 2023Submitted to Molecular Ecology Resources
27 Feb 2023Submission Checks Completed
27 Feb 2023Assigned to Editor
27 Feb 2023Review(s) Completed, Editorial Evaluation Pending
03 Mar 2023Reviewer(s) Assigned
05 Apr 2023Editorial Decision: Revise Minor
03 May 20231st Revision Received
05 May 2023Submission Checks Completed
05 May 2023Assigned to Editor
05 May 2023Review(s) Completed, Editorial Evaluation Pending
24 May 2023Editorial Decision: Revise Minor
16 Jun 20232nd Revision Received
19 Jun 2023Submission Checks Completed
19 Jun 2023Assigned to Editor
19 Jun 2023Review(s) Completed, Editorial Evaluation Pending
05 Jul 2023Editorial Decision: Accept