Diatom communities preserved in sediment samples are valuable indicators for understanding the past and present dynamics of phytoplankton communities, and their response to environmental changes. These studies are traditionally achieved by counting methods using optical microscopy, a time-consuming process that requires taxonomic expertise. With the advent of automated image acquisition workflows, large image datasets can now be acquired, but require efficient preprocessing methods. Detecting diatom frustules on microscope images is a challenge due to their low relief, diverse shapes, and tendency to aggregate, which prevent the use of traditional thresholding techniques. Deep learning algorithms have the potential to resolve these challenges, more particularly for the task of object detection. Here we explore the use of a Faster R-CNN (Region-based Convolutional Neural Network) model to detect siliceous biominerals, including diatoms, in microscope images of a sediment trap series from the Mediterranean Sea. Our workflow demonstrates promising results, achieving a precision score of 0.72 and a recall score of 0.74 when applied to a test set of Mediterranean diatom images. Our model performance decreases when used to detect fragments of these microfossils; it also decreases when particles are aggregated or when images are out of focus. Microfossil detection remains high when the model is used on a microscope image set of sediments from a different oceanic basin, demonstrating its potential for application in a wide range of contemporary and paleoenvironmental studies. This automated method provides a valuable tool for analysing complex samples, particularly for rare species under-represented in training datasets.