To automate the detection of POST displays, we are using a deep learning algorithm known as Faster RCNN \cite{Ren_2017} . The implementation of Faster RCNN is adopted from existing GitHub repository by endernewton \cite{gupta2017}. In addition to preparing training data and validation data, three different anchor scale sets of [4,8,16], [8,16,32], and [4,8,16,32] were set for 3 training attempts with 70,000 iterations. Due to the limited number of tobacco advertisements detected in images from Google Street View in comparison to object detection standards, the [4, 8, 16] anchor scale for our training set provided the best mean average precision (mAP).
Our main metric for evaluating performance is mAP. This measure is our precision for each class in our data that we are detecting. The mAP value can change based on choices of our parameters such as number of iterations, anchor size, and classification threshold. Anchors are region boxes that are used to contain objects in an image. Multiple anchors are generated for an image and ranked according to how likely they contain a single object. The number of iterations reflects the number of times our model trains on a dataset. The higher the number of iterations, the more opportunities our model has to reduce the error between its output and the correct classification of a sign.