Camera traps have revolutionized animal research of many species that were previously nearly impossible to observe due to their habitat or behavior. Deep learning has the potential to overcome the workload to the class automatically those images according to taxon or empty images. However, a standard deep neural network classifier fails because animals often represent a small portion of the high-definition images. Therefore, we propose a workflow named Weakly Object Detection Faster-RCNN+FPN which suits this challenge. The model is weakly supervised because it requires only the animal taxon label per image but doesn’t require any manual bounding box annotations. First, it automatically performs the weakly supervised bounding box annotation using the motion from multiple frames. Then, it trains a Faster-RCNN+FPN model using this weak supervision. Experimental results have been obtained on two datasets and an easily reproducible testbed.