Mathematics for Machine Learning Part-1

Mathematics is universal. Machine Learning is built on top of mathematical prerequisites such as Linear Algebra, Probability and Statistics. We shall now see the implementation of the basic underlying mathematical concepts in each of these prerequisites using Python.

A branch of mathematics that is concerned with mathematical structures closed under the operations of addition and scalar multiplication and that includes the theory of systems of linear equations, matrices, determinants, vector spaces, and linear transformations — Source: Merriam Webster

If you haven’t gone through — Getting Familiar with Numpy, make sure to go through it! We’ll be needing Numpy in implementing most of the Math concepts!

Getting Started

Libraries you’ll be needing to import for this tutorial — Numpy, Matplotlib and Scipy

import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as la
%matplotlib inline

Let’s compute 2I+3A−AB (I is the identity matrix of size 2) for —

A = np.array([[1,3],[-1,7]])
print(A)Out:
[[ 1  3]
[-1  7]]B = np.array([[5,2],[1,2]])
print(B)Out:
[[5 2]
[1 2]]I = np.eye(2)
print(I)Out:
[[1. 0.]
[0. 1.]]2*I + 3*A - A@BOut:
array([[-3.,  1.],
[-5., 11.]])

Matrix Powers

There’s no symbol for matrix powers and so we must import the function matrix_power from the subpackage numpy.linalg.

from numpy.linalg import matrix_power as mpowM = np.array([[3,4],[-1,5]])
print(M)Out:
[[ 3  4]
[-1  5]]mpow(M,2)Out:
array([[ 5, 32],
[-8, 21]])mpow(M,5)Out:
array([[-1525,  3236],
[ -809,    93]])

Compare with the matrix multiplication operator:

M @ M @ M @ M @ MOut:
array([[-1525,  3236],
[ -809,    93]])

Transpose

We can take the transpose with .T attribute:

print(M)Out:
[[ 3  4]
[-1  5]]print(M.T)Out:
[[ 3 -1]
[ 4  5]]

Notice that M.MT is a symmetric matrix:

M @ M.TOut:
array([[25, 17],
[17, 26]])

Inverse

We can find the inverse using the function scipy.linalg.inv:

A = np.array([[1,2],[3,4]])
print(A)Out:
[[1 2]
[3 4]]la.inv(A)Out:
array([[-2. ,  1. ],
[ 1.5, -0.5]])

Trace

We can find the trace of a matrix using the function numpy.trace:

np.trace(A)Out:
5

Determinant

We find the determinant using the function scipy.linalg.det:

A = np.array([[1,2],[3,4]])
print(A)Out:
[[1 2]
[3 4]]la.det(A)Out:
-2.0

Characteristic Polynomials and Cayley-Hamilton Theorem

The characteristic polynomial of a 2 by 2 square matrix A is

The Cayley-Hamilton Theorem states that any square matrix satisfies its characteristic polynomial.

For a matrix A of size 2, this means that

Let’s verify the Cayley-Hamilton Theorem for a few different matrices —

print(A)Out:
[[1 2]
[3 4]]trace_A = np.trace(A)
det_A = la.det(A)
I = np.eye(2)
A @ A - trace_A * A + det_A * IOut:
array([[0., 0.],
[0., 0.]])

Projections

The formula to project a vector v onto a vector w is —

Let’s a function called proj which computes the projection v onto w:

def proj(v,w):
'''Project vector v onto w.'''
v = np.array(v)
w = np.array(w)
return np.sum(v * w)/np.sum(w * w) * w  # or (v @ w)/(w @ w) * wproj([1,2,3],[1,1,1])Out:
array([2., 2., 2.])

Solving Linear Systems

Linear Systems

A linear system of equations is a collection of linear equations

In matrix notation, a linear system is Ax=b where

Gaussian elimination

The general procedure to solve a linear system of equation is called Gaussian elimination. The idea is to perform elementary row operations to reduce the system to its row echelon form and then solve.

Elementary Row Operations

Elementary row operations include:

Add k times row j to row i
Multiply row i by scalar k
Switch rows i and j

Each of the elementary row operations is the result of matrix multiplication by an elementary matrix. To add k times row i to row j in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except the i,j entry is Ei,j=k.

For example, if A is 3 by 3 and we want to add 3 times row 2 to row 0 (using 0 indexing) then —

Let’s verify the calculation:

A = np.array([[1,1,2],[-1,3,1],[0,5,2]])
print(A)Out:
[[ 1  1  2]
[-1  3  1]
[ 0  5  2]]E1 = np.array([[1,0,3],[0,1,0],[0,0,1]])
print(E1)Out:
[[1 0 3]
[0 1 0]
[0 0 1]]E1 @ AOut:
array([[ 1, 16,  8],
[-1,  3,  1],
[ 0,  5,  2]])

To multiply k times row i in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except the i,j entry is Ei,i=k.

For example, if A is 3 by 3 and we want to multiply row 1 by -2 then —

Let’s verify the calculation:

E2 = np.array([[1,0,0],[0,-2,0],[0,0,1]])
print(E2)Out:
[[ 1  0  0]
[ 0 -2  0]
[ 0  0  1]]E2 @ AOut:
array([[ 1,  1,  2],
[ 2, -6, -2],
[ 0,  5,  2]])

Finally, to switch row i and row j in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except Ei,i=0, Ej,j=0, Ei,j=1 and Ej,i=1.

For example, if A is 3 by 3 and we want to switch row 1 and row 2 then —

Let’s verify the calculation:

E3 = np.array([[1,0,0],[0,0,1],[0,1,0]])
print(E3)Out:
[[1 0 0]
[0 0 1]
[0 1 0]]E3 @ AOut:
array([[ 1,  1,  2],
[ 0,  5,  2],
[-1,  3,  1]])

Implementation

Let’s now write functions to implement the elementary row operations. First of all, let’s write a function called add_rows which takes input parameters A, k, i and j and returns the NumPy array resulting from adding k times row j to row i in the matrix A.

If i=j, then let’s say that the function scales row i by k+1 since this would be the result of k times row i added to row i.

def add_row(A,k,i,j):
"Add k times row j to row i in matrix A."
n = A.shape[0]
E = np.eye(n)
if i == j:
E[i,i] = k + 1
else:
E[i,j] = k
return E @ A

Let’s test our function:

M = np.array([[1,1],[3,2]])
print(M)Out:
[[1 1]
[3 2]]add_row(M,2,0,1)Out:
array([[7., 5.],
[3., 2.]])

Let’s write a function called scale_row which takes 3 input parameters A, k, and i and returns the matrix that results from k times row i in the matrix A.

def scale_row(A,k,i):
"Multiply row i by k in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = k
return E @ A

Let’s test our function:

M = np.array([[3,1],[-2,7]])
print(M)Out:
[[ 3  1]
[-2  7]]scale_row(M,3,1)Out:
array([[ 3.,  1.],
[-6., 21.]])

Let’s write a function called switch_rows which takes 3 input parameters A, i and j and returns the matrix that results from switching rows i and j in the matrix A.

def switch_rows(A,i,j):
"Switch rows i and j in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = 0
E[j,j] = 0
E[i,j] = 1
E[j,i] = 1
return E @ A

Let’s test our function:

A = np.array([[1,1,1],[1,-1,0]])
print(A)Out:
[[ 1  1  1]
[ 1 -1  0]]switch_rows(A,0,1)Out:
array([[ 1., -1.,  0.],
[ 1.,  1.,  1.]])

Examples

Find the Inverse

Let’s apply our functions to the augmented matrix [M | I] to find the inverse of the matrix M:

M = np.array([[5,4,2],[-1,2,1],[1,1,1]])
print(M)Out:
[[ 5  4  2]
[-1  2  1]
[ 1  1  1]]A = np.hstack([M,np.eye(3)])
print(A)Out:
[[ 5.  4.  2.  1.  0.  0.]
[-1.  2.  1.  0.  1.  0.]
[ 1.  1.  1.  0.  0.  1.]]A1 = switch_rows(A,0,2)
print(A1)Out:
[[ 1.  1.  1.  0.  0.  1.]
[-1.  2.  1.  0.  1.  0.]
[ 5.  4.  2.  1.  0.  0.]]A2 = add_row(A1,1,1,0)
print(A2)Out:
[[1. 1. 1. 0. 0. 1.]
[0. 3. 2. 0. 1. 1.]
[5. 4. 2. 1. 0. 0.]]A3 = add_row(A2,-5,2,0)
print(A3)Out:
[[ 1.  1.  1.  0.  0.  1.]
[ 0.  3.  2.  0.  1.  1.]
[ 0. -1. -3.  1.  0. -5.]]A4 = switch_rows(A3,1,2)
print(A4)Out:
[[ 1.  1.  1.  0.  0.  1.]
[ 0. -1. -3.  1.  0. -5.]
[ 0.  3.  2.  0.  1.  1.]]A5 = scale_row(A4,-1,1)
print(A5)Out:
[[ 1.  1.  1.  0.  0.  1.]
[ 0.  1.  3. -1.  0.  5.]
[ 0.  3.  2.  0.  1.  1.]]A6 = add_row(A5,-3,2,1)
print(A6)Out:
[[  1.   1.   1.   0.   0.   1.]
[  0.   1.   3.  -1.   0.   5.]
[  0.   0.  -7.   3.   1. -14.]]A7 = scale_row(A6,-1/7,2)
print(A7)Out:
[[ 1.          1.          1.          0.          0.          1.        ]
[ 0.          1.          3.         -1.          0.          5.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]A8 = add_row(A7,-3,1,2)
print(A8)Out:
[[ 1.          1.          1.          0.          0.          1.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]A9 = add_row(A8,-1,0,2)
print(A9)Out:
[[ 1.          1.          0.          0.42857143  0.14285714 -1.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]A10 = add_row(A9,-1,0,1)
print(A10)Out:
[[ 1.          0.          0.          0.14285714 -0.28571429  0.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]

Let’s verify if we found M inverse correctly:

Minv = A10[:,3:]
print(Minv)Out:
[[ 0.14285714 -0.28571429  0.        ]
[ 0.28571429  0.42857143 -1.        ]
[-0.42857143 -0.14285714  2.        ]]result = Minv @ M
print(result)Out:
[[ 1.00000000e+00  4.44089210e-16  2.22044605e-16]
[-6.66133815e-16  1.00000000e+00 -2.22044605e-16]
[ 0.00000000e+00  0.00000000e+00  1.00000000e+00]]

Success! We can see the result more clearly if we round to 15 decimal places:

np.round(result,15)Out:
array([[ 1.e+00,  0.e+00,  0.e+00],
[-1.e-15,  1.e+00, -0.e+00],
[ 0.e+00,  0.e+00,  1.e+00]])

Solve a System

Let’s use our functions to perform Gaussian elimination and solve a linear system of equations Ax=b.

A = np.array([[6,15,1],[8,7,12],[2,7,8]])
print(A)Out:
[[ 6 15  1]
[ 8  7 12]
[ 2  7  8]]b = np.array([[2],[14],[10]])
print(b)Out:
[[ 2]
[14]
[10]]

Form the augmented matrix M:

M = np.hstack([A,b])
print(M)[[ 6 15  1  2]
[ 8  7 12 14]
[ 2  7  8 10]]

Perform row operations:

M1 = scale_row(M,1/6,0)
print(M1)Out:
[[ 1.          2.5         0.16666667  0.33333333]
[ 8.          7.         12.         14.        ]
[ 2.          7.          8.         10.        ]]M2 = add_row(M1,-8,1,0)
print(M2)Out:
[[  1.           2.5          0.16666667   0.33333333]
[  0.         -13.          10.66666667  11.33333333]
[  2.           7.           8.          10.        ]]M3 = add_row(M2,-2,2,0)
print(M3)Out:
[[  1.           2.5          0.16666667   0.33333333]
[  0.         -13.          10.66666667  11.33333333]
[  0.           2.           7.66666667   9.33333333]]M4 = scale_row(M3,-1/13,1)
print(M4)Out:
[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          2.          7.66666667  9.33333333]]M5 = add_row(M4,-2,2,1)
print(M5)Out:
[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          0.          9.30769231 11.07692308]]M6 = scale_row(M5,1/M5[2,2],2)
print(M6)Out:
[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          0.          1.          1.19008264]]M7 = add_row(M6,-M6[1,2],1,2)
print(M7)Out:
[[1.         2.5        0.16666667 0.33333333]
[0.         1.         0.         0.1046832 ]
[0.         0.         1.         1.19008264]]M8 = add_row(M7,-M7[0,2],0,2)
print(M8)Out:
[[1.         2.5        0.         0.13498623]
[0.         1.         0.         0.1046832 ]
[0.         0.         1.         1.19008264]]M9 = add_row(M8,-M8[0,1],0,1)
print(M9)Out:
[[ 1.          0.          0.         -0.12672176]
[ 0.          1.          0.          0.1046832 ]
[ 0.          0.          1.          1.19008264]]

Success! The solution of Ax=b is

x = M9[:,3].reshape(3,1)
print(x)Out:
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]

Or, we can do it the easy way…

x = la.solve(A,b)
print(x)Out:
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]

`scipy.linalg.solve`

We are mostly interested in linear systems Ax=b where there is a unique solution x. This is the case when A is a square matrix (m=n) and det(A)≠0. To solve such a system, we can use the function scipy.linalg.solve.

The function returns a solution of the system of equations Ax=b. For example:

A = np.array([[1,1],[1,-1]])
print(A)Out:
[[ 1  1]
[ 1 -1]]b1 = np.array([2,0])
print(b1)Out:
[2 0]

And solve:

x1 = la.solve(A,b1)
print(x1)Out:
[1. 1.]

Note that the output x is returned as a 1D Numpy array when the vector b (the right-hand side) is entered as a 1D Numpy array. If we input b as a 2D Numpy array, then the output is a 2D Numpy array.

For example:

A = np.array([[1,1],[1,-1]])
b2 = np.array([2,0]).reshape(2,1)
x2 = la.solve(A,b2)
print(x2)Out:
[[1.]
[1.]]

Finally, if the right-hand side b is a matrix, then the output is a matrix of the same size. It is the solution of Ax=b when b is a matrix.

For example:

A = np.array([[1,1],[1,-1]])
b3 = np.array([[2,2],[0,1]])
x3 = la.solve(A,b3)
print(x3)Out:
[[1.  1.5]
[1.  0.5]]

Simple Example

Let’s compute the solution of the system of equations

Create the matrix of coefficients:

A = np.array([[2,1],[1,1]])
print(A)Out:
[[2 1]
[1 1]]

And the vector b:

b = np.array([1,-1]).reshape(2,1)
print(b)Out:
[[ 1]
[-1]]

And solve:

x = la.solve(A,b)
print(x)Out:
[[ 2.]
[-3.]]

We can verify the solution by computing the inverse of A:

Ainv = la.inv(A)
print(Ainv)Out:
[[ 1. -1.]
[-1.  2.]]

And multiply A inverse and b to solve for x:

x = Ainv @ b
print(x)Out:
[[ 2.]
[-3.]]

We get the same result. Success!

Inverse or Solve

It’s a bad idea to use A inverse to solve Ax=b if A is large. It’s too computationally expensive.

Let’s create a large random matrix A and vector b and compute the solution x in 2 ways —

N = 1000
A = np.random.rand(N,N)
b = np.random.rand(N,1)

Check the first entries A:

A[:3,:3]Out:
array([[0.35754719, 0.63135432, 0.6572258 ],
[0.18450506, 0.14639832, 0.23528745],
[0.27576474, 0.46264005, 0.26589724]])

And for b:

b[:4,:]Out:
array([[0.82726751],
[0.96946096],
[0.31351176],
[0.63757837]])

Now we compare the speed of scipy.linalg.solve with scipy.linalg.inv:

%%timeit
x = la.solve(A,b)Out:
2.77 s ± 509 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%%timeit
x = la.inv(A) @ bOut:
4.46 s ± 2.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solving with scipy.linalg.solve is about twice as fast!

That’s all for Part-1! I know that’s a lot to take in at once! But you made it until the end! Kudos on that! Do not forget to check out the other parts of this article!

There are a lot of other good resources if you’re still interested in getting the most out of this topic —

For the complete implementation, do check out my GitHub Repository —

To contact, or for further queries, feel free to drop a mail at — tp6145@bennett.edu.in

Getting Started

Matrix Powers

Transpose

Inverse

Trace

Determinant

Characteristic Polynomials and Cayley-Hamilton Theorem

Projections

Solving Linear Systems

Linear Systems

Gaussian elimination

Elementary Row Operations

Implementation

Examples

Find the Inverse

Solve a System

scipy.linalg.solve

Simple Example

Inverse or Solve

Footer

`scipy.linalg.solve`