Statistical analyses
All the statistical analyses presented here were performed in RStudio (RStudio Team, 2020) for R (R Core Team, 2021). Our raw data were first transformed using the Hellinger transformation (square root transformation of the relative concentration in each sample) to limit the influence of higher values and compute Euclidean distance matrix between each sample. We performed a Principal Component Analysis (PCA) on this distance matrix to identify variables significantly explaining the variation in physico-chemical characteristics in our dataset using the vegan package (version 2.6-4) (Oksanen et al., 2022) to obtain a set of 13 variables in our final dataset : fructose, glucose, turanose, maltose, mannose, glutamine, leucine, proline, valine, acetic acid, lactic acid, formic acid, pyruvic acid.
The raw data of the selected compounds were standardized using the double Wisconsin transformation. Using this approach, each value in the “species x compounds” matrix was divided by its column maximum, and then divided by the row total, producing values between 0 and 1 that equalize emphasis among sample units and among species. Bray–Curtis dissimilarities were computed on the transformed data to conduct multivariate analyses using the vegan package. We then performed Permutational Multivariate Analysis of Variance (PERMANOVA) to evaluate significant differences among honey samples from stingless bees and honey bees. These differences between groups of honey samples were illustrated in Non-Metric Multidimensional Scaling (NMDS) plots using the ggplot2 package (version 3.3.3) (Wickham, 2016) and a dendrogram (hierarchical clustering) using the ggtree package (version 3.4.1) (Yu, 2020). Indicator compounds analysis was performed to identify the compounds associated with each taxa using theindicspecies package version (1.7.14) (Cáceres & Legendre, 2009). We also validated the classification of our honey samples using model-based approach. Our dataset was split in two equal parts to (1) train and (2) test a conditional random forest model that assigns honey bee or stingless bee label to each sample using theextendedForest package (version 1.6.1) (Liaw & Wiener, 2002). We also compared honeys from stingless bees and honeys bees in terms of chemical diversity and dissimilarity by comparing diverse types of indexes : Hill diversity, Functional Hill diversity, Shannon’s diversity, Simpson diversity, Rao’s Q, Pielou’s evenness and Hill evenness. Diversity indexes assessments were based on the chemical structure of the 36 compounds quantified using the chemodiv package version (0.3.0) (Petrén et al., 2023).
We then investigated patterns of variation in SBH composition by testing the differences in the composition according to the tropical regions. Further analyses performed to estimate the importance of environmental and phylogenetic constraints on honey composition, using the full dataset at the global scale but also datasets at the continental scale including honey samples from each of the tropical region. We computed phylogenetic dissimilarities based on the patristic phylogenetic distances between genera using the phylogeny established by Rasmussen & Cameron, 2010 (Supplementary Fig. 1) using the adephylo package (version 1.1-16) (Jombart & Dray, 2008). Environmental dissimilarities between locations were computed based on 8 variables impacting nectar production (soil moisture, precipitation, solar radiation) and composition (Tree cover, moisture, temperature, forest density, plant biodiversity). Solar irradiance (The World Bank Group, 2023) and water availability (Topographic wetness; Title & Bemmels, 2018 & Annual Mean precipitation; Fick & Hijmans, 2017) are the main limiting factors in nectar production, influencing its concentration in sugar. Humidity (Climatic Moisture Index; Title & Bemmels, 2018) and temperature (Annual Temperature; Fick & Hijmans, 2017) impact evaporation rate leading to differences nectar viscosity and concentration (Nicolson & Thornburg, 2007). Honey produced in forest differed in its composition (Lowore et al., 2018; Cannizzaro et al., 2022) due to the influence of tree cover and density (Forest density; Riitters et al., 2000) on microclimatic factors (Parachnowitsch et al., 2019). Finally, we included plant biodiversity (native plant diversity; Ellis et al., 2012) which impact resource availability and pollen profile of honeys (mono-/polyfloral; de Sousa et al., 2016; Ávila et al., 2019). Data were extracted from Raster files using annual or monthly mean value when available.
We estimated the part of variation of chemical dissimilarity explained by the phylogenetic and environmental dissimilarity matrix using thevarpart function in vegan . Finally, we tested correlations and linear relationships between the Bray-Curtis dissimilarity matrix of chemical composition and the phylogenetic/environmental dissimilarities using partial mantel tests and distance-based RDA. Variation partitioning and correlation analysis were performed at the global but also at continental scale to disentangle ecological and evolutionary drivers since phylogeny follows a geographical split.