What is not taught in object detection tutorials and books?

False Positives/Negatives, True Positives/Negatives! How to annotate background images? Power of background images.

I have spent days searching for materials about the background images, annotation of background images, the effect of background images and I decided to write an article that sums all my findings in one article so that other practitioners could help you to understand all the power of background images

Recently I have done object detection using different algorithms like YoloV3, YoloV4, MobileNet SSD for a freelance project. I have collected image dataset myself, taking photos using my phone, annotated them using the LabelImg annotation tool. Before I started doing this object detection project, I assumed that it is going to be an easy job, but as I started annotating 6000+ images manually, I realized that object detection with the custom dataset is not really an easy job. There are a lot of automated annotation tools that exist online, but I wanted to experience this process myself, to know how hard and what underwater rocks exist in the process of image annotation. Ats this project was my first serious and big machine learning project, from time to time I used to look at different books and tutorials that were teaching similar projects. What I noticed in these books and tutorials is that almost all of them were not teaching or at least informing the fact of the presence of False Positives/Negatives and True Positives/Negatives. This issue has come up to me when I started testing my trained weights of object detection model on my PC. When I was testing the model, initially I started to see different False Positive outputs in the video frame. I was surprised by this result as I did not see any of these kinds of issues in tutorials and books. I started doing research on this issue and found some useful tips for solving this issue.

Yes! You will need to collect more background images. When I found this solution I did not understand why we need background images and what they are. From further researches, I understood that this is needed to “teach” the model what IS the real target object(an object that you want to detect/classify) and what IS NOT the real target object — as I understand this is similar to generalization method. So background images do not contain your target object in any of its forms. For background images, you can collect random images that do not contain any of your target objects and when you annotate your background images you just skip them without any annotations, so that your images do not have any X, Y coordinates in an annotation file like XML. After my research on this topic, I applied all the helpful steps I found online to my object detection model. The result was as predicted — False Positives are gone forever! I hoped that it is going to work and it really worked! I will share some helpful links below at the bottom of this article regarding the background images and decreasing the FP metric in your object detection project. For now, let me give you some background image annotation hints I learned during my project.

Make sure that you collected enough images for each class of object: it can vary depending on your object detection algorithm. In my case, for the MobileNet SSD v2 algorithm minimum required image quantity is 300–400 per class. But in my project, I collected around 6000+ images and annotated them manually by myself using the LabelImg annotation tool. When I found the hint that collecting background images can help you decrease FPs, I was not sure of how to annotate background images. After some additional research on this topic, I found that it can be done easily within the LabelImg tool, just by clicking the “Space” button when you want to annotate a background image in “LabelImg”.

Note: You do not have to save the images, they are saved automatically when you press the “Space” button

Your annotation file(XML) for background images going to look like this:

<folder>My Folder</folder>
<filename>image.jpg</filename>
<path>/home/usuario/Escritorio/PR/My Folder/image.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1812</width>
<height>2416</height>
<depth>3</depth>
</size>
<segmented>0</segmented>

After you are done with collecting and labeling your images dataset, you can start with the next step: converting it to TFRecords files(if you train using Tensorflow). For that you can have a look at these examples:

Footer