There are numerous articles online explaining the difference between Cros-Correlation and Convolution operation. But, is there really a difference between the two in the context of Deep Learning, more specifically Convolutional Neural Networks? Of course, there is, that’s why two different terms… or is there?🤔🙄
I tried not using much math so that it’s easier to follow.
What’s the formal difference between Cross-Correlation vs Convolution?
Let’s just recap what we really mean by Cross-Correlation and Convolution in the case of signal processing.
Let’s say we have three functions,
f(x) and g(x), for simplicity I took both of them to be step functions but they can be different, and h( x ) is a time-reversed (flipped) version of g(x).
Now,
Cross-Correlation of f( x ) and g( x ) is found by fixing any one of the two functions, let’s say we fix f( x ), and slide the other, g( x ), over f( x ) while multiplying the functions and adding at each shift.
Cross-Correlation of two functions gives a measure of similarity between the functions.
On the other hand,
Convolution of f( x ) and g( x ) is found by fixing any one of the two functions, let’s say we fix f( x ), and flipping the other function, g( x ), about the y-axis and then sliding it over f( x ) while multiplying the functions and adding at each shift.
Convolution of two functions gives the output fuction when one function is transformed by the other.
So essentially if you see, Convolution of f( x ) and g( x ) = Cross-Correlation of f( x ) and g( – x ),
i.e,
f(x) * g(x) = Cross-Correlation of f(x) and h( x ),where h( x ) = g( – x )
NOTE :- “ * “ is the symbol for Convolution.
Q – What’s the difference between Cross-Correlation vs Convolution in the context of Deep Learning?
Ans – None.
Allow me to explain.
In the case of Deep Learning, Convolutional Neural Networks per se, weights of the kernel(a small matrix) are learnt during training using backpropagation and are not defined/set explicitly.
If we expect a particular result after a convolution operation using a particular kernel, we can get the same result if we do cross-correlation with the same kernel flipped horizontally and vertically (i.e in both X & Y directions).
For example, let’s say, the model should learn kernel “A” performing actual convolution with the image and through backpropagation.
Instead of kernel “A”, if the model learns a flipped version of kernel “A”, say kernel “B”, we can directly perform the cross-correlation i.e not wasting any computation power to flip the kernel!
That means Cross-Correlation is equivalent to Convolution in case CNNs, provided the kernels learnt are mirror images of each case in both the directions.
This demystifies the reason why, one of the most popular Deep Learning frameworks, Tensorflow does not perform a proper/actual convolution, because it doesn’t need to.
So, if anyone asks you in future, is there any difference between Cros-Correlation and Convolution? Reply with “Yes, but No.”