Proving Convexity of Mean Squared Error Loss Function

In this blog post, we shall prove the convexity for the Mean Squared Error Loss function used in a traditional Regression setting.
In case you haven’t checked out my previous blog — The Curious Case of Convex Functions, I would highly recommend you do. The blog focuses on the basic building blocks for proving/testing the convexity of a function.

With that in mind, let us start by reviewing –

1. MSE Loss Function –

The MSE loss function in a Regression setting is defined as –

For checking the convexity of the Mean-Squared-Error function, we shall perform the following checks –

Let us get down to it right away-

From the previous blog post, we know that a function is convex if all the principal minors are greater than or equal to zero i.e. Δₖ ≥ 0 ∀ k

Principal Minors of order 1 (Δ₁) can be obtained by deleting any 3–1 = 2 rows and corresponding columns.

Principal Minors of order 2 can be obtained by deleting any 3–2 = 1 row and corresponding column.

Principal Minors of order 3 can be obtained by computing the determinant of J(W).

The principal minors of order 1 have a squared form. We know that a squared function is always positive.
The principal minors of orders 2 and 3 are equal to zero.
It can be concluded that Δₖ ≥ 0 ∀ k
Hence the Hessian of J(w) is Positive Semidefinite (but not Positive Definite).

Before we comment on the convexity of J(w), let’s revise the conditions for convexity –

If Xᴴ is the Hessian Matrix of f(x) then –

Since the Hessian of J(w) is Positive Semidefinite, it can be concluded that the function J(w) is convex.

This blog post is aimed at proving the convexity of MSE loss function in a Regression setting by simplifying the problem.
There are different ways of proving the convexity but I found this easiest to comprehend.
It is important to note that MSE is convex on its inputs and parameters but it may not be convex when used in the context of a Neural Network given the non-linearities introduced by the activation functions.
Feel free to try out the process for different loss functions that you may have encountered.