Figure 3 . Variable importance and dependence plots for the top selected variables linking microhabitat to of nestbox occupancy by hazel dormice M. avellanarius in a UK woodland. Top large panel shows the 27 variables with binomial test p-values<0.05 with values for the three importance metrics that were less correlated in their ranking (chosen to showcase differences in variable importance among metrics. Fig S1). The three metrics are used for displaying purposes, but all seven metrics were considered to identify the most important variables shown in red colour and labelled with letters that correspond to the bottom dependence plot panels. Bottom panels (labelled a to j) show changes in predicted probability of nestbox occupancy by hazel dormice for the ten most relevant predictors (red symbols on the top panel) in descending order of variable importance (from left to right, top to bottom).
The random forest model based on these ten variables had a OOB error rate of 22.22% (model accuracy 77.78%) with 20.8% false positives (specificity=79.2%) and 23.8% false negatives (sensitivity =76.2%). The model predicted increased probability of nestbox occupancy with more trees within ten metres, particularly more hazel C. avellana and hawthorn C. monogyna trees and at intermediate to high levels of tree canopy and/or understorey closure (values above 90% cover resulted in lower probability of occupancy). Occupancy was also more likely in areas with higher percentages of understorey cover by hazel and honeysuckle L. periculum but lower ground cover of dog’s mercuryMercurialis perennis , and for nestboxes located nearer to other boxes (within 10-15m distance) or isolated (lower probability for intermediate distances) and located further from footpaths and slightly away from woodland margins which may be sources of disturbance (Fig 3).
Occupancy data from 2021 available for model validation was limited, as only 11 boxes in total across the site were occupied during June to October (ten in the woodland site and only one in thehedgeline ) and from those, five boxes in the woodland site were not included in our dataset (thus, we lacked habitat data and could not predict occupancy). The random forest model based on the top ten variables correctly predicted occupancy for five of the six nestboxes occupied in summer 2021 resulting in a 16.7% false negative rate. The single false negative (predicted to be empty but found to be occupied) was a nestbox that had not been occupied in any previous years and was found with an unwoven nest with green leaves in October, but no dormice were present. Due to low numbers of hazel dormice in 2021 (nestbox occupancy was very low), our predictions had a higher false positive rate (41.0%) with 16 boxes predicted to be occupied by the model but found empty during the surveys (the remaining 23 were predicted to be empty and found empty). Predictions based on the complete model with all variables were identical.