For generating a set of reverse attention modules, combining a parallel partial decoder (PPD) and a recurrent reverse attention (RA) module as shown in Figure 1.
Parallel Partial Decoder (PPD)
As demonstrated by [Wu et al., 2019], low-level features require more computational resources due to their larger spatial resolution compared to high-level features, but their contribution to performance is small. Based on this observation, PraNet proposes to aggregate high-level features with parallel partial decoder components. For an input polyp image I of size h x w, features fi with five levels of resolution are extracted from the backbone network. Then, the features of fi are partitioned into low-level features {fi , i = 1, 2} and high-level features {fi , i = 3, 4, 5}. A new SOTA decoder component, partial decoder pd(•) [Wu et al., 2019], is introduced to aggregate the high-level features with parallel connections. The partial decoder features are computed by PD = pd(f3, f4, f5) to obtain the global map Sg. The global map Sg is derived from the deepest CNN layer and can only capture a relatively rough location of the polyp tissue without any structural details.
Recurrent Reverse Attention (RA) Module
A strategy of incremental mining of identifiable polyp regions by inverse attention can eventually refine an imprecise and coarse estimation into an accurate and complete prediction map.
From Fig. 1, Eqn. 1 and Eqn. 2, the RA module multiplies the high-level side output features {fi , i = 3, 4, 5} by the inverse attention weight Ai to obtain the output inverse attention feature Ri. The inverse attention weight Ai is a method used for salient object detection [Chen et al., 2018 and Zhang et al., 2020], where P(•) is an upsampling operation, σ(•) is a sigmoid function, and ⊖ is a is the inverse operation of subtracting the input from the matrix E where all elements are 1.
Instead of aggregating all levels of features, the three parallel high-level features {fi , i = 3, 4, 5} are multiplied by the inverse attention weight Ai element by element to obtain the output inverse attention feature Ri.