Lets dive into Parallel Reverse Attention Networks and explore their potential
As someone who is passionate about ML in healthcare, I was quite fascinated by PraNet [1]. I came across PraNet when I was doing Kaggle’s Kidney competition which was about identifying glomeruli (clusters of blood) in human kidney tissue images. The reason I am mentioning this is that PraNet is used for a similar task.
To understand the importance of PraNet, we must first understand the impact of Colorectal cancer.
Colorectal cancer is the third most common cancer diagnosed in both men and women each year in the United States, excluding skin cancer. This year, an estimated 147,950 adults in the United States will be diagnosed with colorectal cancer.
Source: Cancer.net
As someone who suffers from Irritable bowel Syndrome I can’t even begin to image what people with colorectal cancer must be going through. PraNet essentially tries to prevent colorectal cancer by automating screening tests through neural networks. On a more detailed note, this translates to segmenting polyps (projecting growth of tissue from a surface in the body, usually a mucous membrane) from colonoscopy images. This is quite a challenging task since polyps differ in size, color, texture and border sharpness.
How PraNet [1] Works:
(1) Aggregating features in high-level layers using a parallel partial decoder (PPD), the combined feature takes contextual information and generates a global map as the initial guidance area for the subsequent steps.
To further mine the boundary cues, we leverage a set of recurrent reverse attention (RA) modules to establish the relationship between areas and boundary cues.
(2) We introduce several novel evaluation metrics for polyp segmentation and present a comprehensive benchmark for existing state of the art models (SOTA) that are publicly available.
(3) Extensive experiments demonstrate that the proposed PraNet outperforms most cutting-edge models and advances the SOTAs by a large margin, on five challenging datasets, with real-time inference and shorter training time.
By Deng-Ping Fan and Ge-Peng Ji and Tao Zhou and Geng Chen and Huazhu Fu and Jianbing Shen and Ling Shao.2020. In arxiv
What is a parallel partial decoder [2]?
This isn’t an easy question because this is a full separate paper on its own, but I will give you an idea of what it is. Encoder decoder frameworks are the current SOTA models for image segmentation. A partial decoder discards features of shallower layers of the image, this speeds up the feature extraction process. This decoder is based on the VGG16 model.
The decoder essentially tries to integrate all features into one saliency map. A saliency map is an image that shows the unique quality of each pixel to simplify the representation of the image, this speeds up analysis and essentially “simplifies” the image.
Saliency maps are a fundamental building block of computer vision and are used heavily in CNNs during feature extraction. If you want to learn more about them check out this article:
The network initially separates the features of the image into 2 low-level features and 3 high-level features, then the parallel partial decoder aggregates the low-level features in parallel with the high-level features (hence its name) into a global Saliency map. This map is then passed on to the next component, the Reverse Attention Module.
What are Reverse Attention Modules [3]?
Again, I will do my best to break this down in the briefest way, but you have to consider that this is a separate paper. Honestly, this just shows me how much hard work was put in to create this full model !
First of all, I have to introduce the concept of reverse learning in Machine learning. I have found that this quote summaries it quite nicely:
Human inertial thinking schemes can be formed through learning, which are then applied to quickly solve similar problems later. However, when problems are significantly different, inertial thinking generally presents the solutions that are definitely imperfect. In such cases, people will apply creative thinking, such as reverse thinking, to solve problems. Similarly, machine learning methods also form inertial thinking schemes through learning the knowledge from a large amount of data. However, when the testing data are vastly difference, the formed inertial thinking schemes will inevitably generate errors.
By Li Huihui and Wen Guihua In Modeling reverse thinking for machine learning
Reverse attention is a technique that uses reverse learning to perform semantic segmentation of images. There are networks called reverse attention networks that attempt to learn features an predictions on 2 branches, one where those features arent associated and the on other branch, the features are associated with a class. The reverse attention component combines both of those branches focuses on patterns where the responses are weaker (on either branch) and provides a mechanism to amplify this response in another third branch. The predictions of the 3 branches are ensembled for a final prediction.
This technique of learning is a bit confusing and is quite unique to other machine learning models that I previously reviewed, but I think if it shows good results (which we will see now), then the theory its built on must be valid !
Okay lets get back to PraNet now and focus on the big picture:
The authors of PraNet realise that they are using a mixture of uncommon techniques (as explained above) and thats why they measured there results across 5 of the most standard benchmark datasets in their domain. 2 of those datasets are Kvasir and CVC-612.
This table [1] summarises the results and we can see that PraNet outperforms many of the best and state of the art networks in almost all of the metrics used by a significant margin !
Conclusion:
I think there is a lot of potential presented by PraNet and I hope that the article demonstrated this. I didn’t want to dive into a lot of details since research papers can be sometimes a bit too academic, the main point here was to give you a brief overview. However, by all means if you want to dive more into the details, I have left the 3 main papers in the references. Automatic diagnosis of colorectal cancer is a big milestone for AI and hopefully this network would be a solution.
References
[1] PraNet: Parallel Reverse Attention Network for Polyp Segmentation. Deng-Ping Fan and Ge-Peng Ji and Tao Zhou and Geng Chen and Huazhu Fu and Jianbing Shen and Ling Shao.2020. In arxiv
[2] Semantic Segmentation with Reverse Attention. Qin Huang, Chunyang Xia, Wuchi Hao, Siyang Li, Ye Wang, Yuhang Song and C.-C. Jay Kuo. In arxiv
[3] Cascaded Partial Decoder for Fast and Accurate Salient Object Detection. Zhe Wu and Li Su and Qingming Huang. 2019. In arxiv