Introducing Distance Correlation, a Superior Correlation Metric.

Let (Xk, Yk), k = 1, 2, …, n be a statistical sample from a pair of two random variables, X & Y.

First, we compute the n by n distance matrices (aj, k) and (bj, k) containing all pairwise distances.

Then we take the double centered distances.

From a visual perspective, by taking the double centered distances, we are transforming the matrix representation (the left) to the diagram on the right (double centered matrix).

Image created by Author

Why do we do this?

The reason that we do this is for the following reason. Any sort of covariance is the cross-product of moments. Since distances aren’t moments, we have to compute them into moments. To compute these moments, you have to calculate the deviations from the mean first, which is what double centering achieves.

Lastly, we compute the arithmetic average of the products A and B to get the squared sample distance covariance:

Distance covariance formula

The distance variance is simply the distance covariance of two identical variables. It is the square root of the following:

Distance variance formula

Footer