From this set of data, we plotted a best-fit line to demonstrate a linear relationship…great, but how did we know that this is the **best line **we could have drawn? Well, this sacred knowledge is the result of an important combination of symbols and numbers called the **Cost Function **or the **Mean Squared Error Function**.

No, I did not fall asleep on my keyboard, these are actual equations — but don’t worry because they are actually much simpler than they appear to be! Here we see that the equation takes in two parameters (θ₀,θ₁). This function then takes summation, starting from the first data point(i = 1) to the last data point(m), of the square of the subtraction between the predicted y-value(**hᶿ**(𝒙^*(i)*)) and the actual y-coordinate of a data points(y^(i)). Put simply, take the sum of all the squared values of the error in each data point and multiply it by the inverse of double the amount of data points. This answer is then multiplied by the inverse of double the number of data points. Alright…but what is the **hᶿ **function actually doing? Let’s introduce exactly what its all about: