If you are getting started with Machine Learning or Computer Vision, chances are you have heard the term image segmentation. If you are not familiar with what it means, it essentially allows us to segment the images to different meaningful parts. Image segmentation is a very powerful technique in Computer Vision because it helps us understand the scene with pixel level accuracy. However, in many tutorials, the existence of the 3 methods of image segmentation are often overlooked. More often than not, you may come across the term image segmentation almost interchangeably used with semantic segmentation. However, there are 3 types of image segmentation techniques that has their own uses you should know about. Without further ado, let’s start with the first one.
Semantic segmentation is the simplest one of the three. It essentially allows you to classify every pixel in the image as you can see above. Given a picture of a road, you will know where the road is, where the cars are, where the pedestrians are. This sounds pretty good at first. There are, however, depending on your needs, have a missing piece. Imagine you are using semantic segmentation in a self driving car, at first you start off well, the road looks good, the curbs are detected, there are some pedestrians, and there is a cluster of cars. Yes, you don’t want to be too close to cars, but you also don’t want to perceive individual objects as a cluster, having no knowledge whether you are dealing with a large truck or 5 small cars parked next to each other. That’s where the instance segmentation and panoptic segmentation comes in.
Instance segmentation is essentially the natural next step from bounding box prediction. Rather than just detecting boxes around the objects, with instance segmentation, you can estimate where the objects are in the boxes. So you can basically estimate where the target object is at a pixel level within the box. This way you can separate overlapping objects in the image much more clearly. This is a step up both from bounding box prediction and semantic segmentation.
Instance segmentation, however, does not give you the full picture. Yes, you are much better at understanding the objects in the image, that are countable, like cars, pedestrians, planes, but if you want to get the full picture, you will also need to understand where the objects stand on. That is the reason why we were using semantic segmentation in the first place. This is because instance segmentation is good at detecting objects or “things” in the image. Semantic segmentation on the other hand is better at detecting repeating patterns and “stuff” that are in the background, such as roads, trees, sky, curbs and other stuff that are in the background. If you want to get the best of both worlds, however, you will want to use Panoptic Segmentation.
Panoptic Segmentation gives you the best of both worlds from instance and semantic segmentation.
All the objects in the image are masked as individual objects and every pixel in the background are also classified and masked. Therefore, you can see where the road is, where the trees are, where the cars and pedestrians are, as well as having a detailed understanding of how many cars there are, where one car ends and the other starts. This way, you can track individual cars and pedestrians and calculate their trajectory or speed if you need to.
If you are just getting started with image segmentation, a good place to start is probably using Detectron2 from Facebook AI Research, especially if you are using, or want to use PyTorch. Detectron2 gives you pre-trained neural networks that perform multiple computer vision techniques with state of the art models, including but not limited to instance segmentation, panoptic segmentation, human keypoint detection and more. They also have a beginner friendly Google Colab tutorial which allows you to get up and running with Detectron2 in your browser within minutes. You can find the link below:
If you are just getting started with machine learning and artificial intelligence and you feel like you have been hit with many new concepts, that’s completely okay too. For now you should know that there are 3 ways to segment an image and panoptic segmentation gives you the most comprehensive results.
I hope you got some value out of this article. If you have any questions, leave them as a comment and I will get back to you as fast as I can. If your question requires a more extensive explanation, I can explain that in another article addressing your question.