Machine Learning models identify gene predictors of waggle dance
behaviour in honeybees
Abstract
The molecular characterisation of complex behaviours is a challenging
task as a range of different factors are often involved to produce the
observed phenotype. An established approach is to look at the overall
levels of expression of brain genes – known as ‘neurogenomics’ – to
select the best candidates that associate with patterns of interest.
This approach has relied so far on a set of powerful statistical tools
capable to provide a snapshot of the expression of many thousands of
genes that are present in an organism’s genome. However, traditional
neurogenomic analyses have some well-known limitations; above all, the
limited number of biological replicates compared to the number of genes
tested – often referred to as “curse of dimensionality”. Here we
implemented a new Machine Learning (ML) approach that can be used as a
complement to established methods of transcriptomic analyses. We tested
three types of ML models for their performance in the identification of
genes associated with honeybee waggle dance. We then intersected the
results of these analyses with traditional outputs of differential gene
expression analyses and identified two promising candidates for the
neural regulation of the waggle dance: the G-protein coupled receptor
boss and hnRNP A1, a gene involved in alternative
splicing. Overall, our study demonstrates the application of Machine
Learning to analyse transcriptomics data and identify genes underlying
social behaviour. This approach has great potential for application to a
wide range of different scenarios in evolutionary ecology, when
investigating the genomic basis for complex phenotypic traits.