Autonomous vehicles must be safe. A safe vehicle is a vehicle that can accurately and robustly understand the environment and sensibly decide what to do next. Driverless systems rely on several sensors to perceive their surroundings, combining their characteristics and differences in a mutually beneficial way. Audio has been long overlooked in this context, even though it can provide information which is crucial for safety (e.g., the siren of an emergency vehicle approaching from far and that cannot be seen). The consequence is that only a few sound-based datasets are available, which, in return, further limits investigations in the area. This paper aims to demonstrate that, even when dealing with unbalanced and/or not too-big datasets, there might be hidden information to be used and which could greatly enhance the power of those datasets. In particular, we aim to prove that contextual information, meant as the type of place where and the time of the day when a sound is heard, can significantly ease its identification. By relying on a simple Convolutional Neural Network, we show that contextual information, even when coarse, can increase the accuracy of an acoustic object detector up to 85%.