Automated selection of interest scenes
Each video segment, for which both hornet(s) and honey bee(s) were detected, using the process just described, was automatically extracted using specifically developed software. This software included a step-by-step procedure composed of the following processes: (i ) stereovision acquisition, (ii ) target detection, in each image independently, on RGB-D, (iii ) temporal aggregation for multi-target tracking in 3D (Chiron et al., 2013), (iv ) signature extraction from the individual trajectories, (v ) hierarchical segmentation of the trajectory data into temporal entity, and (vi ) behavioural modelling by multi-level clustering (Chiron et al., 2014). The video segments were then visually reviewed by an observer in order to detect potential successful predation of a honey bee by a hornet (Supporting Information, Figure S1 ). We considered a predation to be successful when a hornet caught a honey bee and flew out of video view with its caught prey, taking into account the limited field of view (about 1.5m2 around the beehive entrance). Each video was reviewed twice by the observer to confirm the successful predation events. A predation was considered as a failure when observing both hornet(s) and honey bee(s) in the same scene but with no predation success (e.g. no catch).