Mathematics is universal. Machine Learning is built on top of mathematical prerequisites such as Linear Algebra, Probability and Statistics. We shall now see the implementation of the basic underlying mathematical concepts in each of these prerequisites using Python.
A branch of mathematics that is concerned with mathematical structures closed under the operations of addition and scalar multiplication and that includes the theory of systems of linear equations, matrices, determinants, vector spaces, and linear transformations — Source: Merriam Webster
If you haven’t gone through — Getting Familiar with Numpy, make sure to go through it! We’ll be needing Numpy in implementing most of the Math concepts!
Getting Started
Libraries you’ll be needing to import for this tutorial — Numpy, Matplotlib and Scipy
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as la
%matplotlib inline
Let’s compute 2I+3A−AB (I is the identity matrix of size 2) for —
A = np.array([[1,3],[-1,7]])
print(A)Out:
[[ 1 3]
[-1 7]]B = np.array([[5,2],[1,2]])
print(B)Out:
[[5 2]
[1 2]]I = np.eye(2)
print(I)Out:
[[1. 0.]
[0. 1.]]2*I + 3*A - A@BOut:
array([[-3., 1.],
[-5., 11.]])
Matrix Powers
There’s no symbol for matrix powers and so we must import the function matrix_power
from the subpackage numpy.linalg
.
from numpy.linalg import matrix_power as mpowM = np.array([[3,4],[-1,5]])
print(M)Out:
[[ 3 4]
[-1 5]]mpow(M,2)Out:
array([[ 5, 32],
[-8, 21]])mpow(M,5)Out:
array([[-1525, 3236],
[ -809, 93]])
Compare with the matrix multiplication operator:
M @ M @ M @ M @ MOut:
array([[-1525, 3236],
[ -809, 93]])
Transpose
We can take the transpose with .T
attribute:
print(M)Out:
[[ 3 4]
[-1 5]]print(M.T)Out:
[[ 3 -1]
[ 4 5]]
Notice that M.MT is a symmetric matrix:
M @ M.TOut:
array([[25, 17],
[17, 26]])
Inverse
We can find the inverse using the function scipy.linalg.inv
:
A = np.array([[1,2],[3,4]])
print(A)Out:
[[1 2]
[3 4]]la.inv(A)Out:
array([[-2. , 1. ],
[ 1.5, -0.5]])
Trace
We can find the trace of a matrix using the function numpy.trace
:
np.trace(A)Out:
5
Determinant
We find the determinant using the function scipy.linalg.det
:
A = np.array([[1,2],[3,4]])
print(A)Out:
[[1 2]
[3 4]]la.det(A)Out:
-2.0
Characteristic Polynomials and Cayley-Hamilton Theorem
The characteristic polynomial of a 2 by 2 square matrix A is
The Cayley-Hamilton Theorem states that any square matrix satisfies its characteristic polynomial.
For a matrix A of size 2, this means that
Let’s verify the Cayley-Hamilton Theorem for a few different matrices —
print(A)Out:
[[1 2]
[3 4]]trace_A = np.trace(A)
det_A = la.det(A)
I = np.eye(2)
A @ A - trace_A * A + det_A * IOut:
array([[0., 0.],
[0., 0.]])
Projections
The formula to project a vector v onto a vector w is —
Let’s a function called proj
which computes the projection v onto w:
def proj(v,w):
'''Project vector v onto w.'''
v = np.array(v)
w = np.array(w)
return np.sum(v * w)/np.sum(w * w) * w # or (v @ w)/(w @ w) * wproj([1,2,3],[1,1,1])Out:
array([2., 2., 2.])
Solving Linear Systems
Linear Systems
A linear system of equations is a collection of linear equations
In matrix notation, a linear system is Ax=b where
Gaussian elimination
The general procedure to solve a linear system of equation is called Gaussian elimination. The idea is to perform elementary row operations to reduce the system to its row echelon form and then solve.
Elementary Row Operations
Elementary row operations include:
- Add k times row j to row i
- Multiply row i by scalar k
- Switch rows i and j
Each of the elementary row operations is the result of matrix multiplication by an elementary matrix. To add k times row i to row j in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except the i,j entry is Ei,j=k.
For example, if A is 3 by 3 and we want to add 3 times row 2 to row 0 (using 0 indexing) then —
Let’s verify the calculation:
A = np.array([[1,1,2],[-1,3,1],[0,5,2]])
print(A)Out:
[[ 1 1 2]
[-1 3 1]
[ 0 5 2]]E1 = np.array([[1,0,3],[0,1,0],[0,0,1]])
print(E1)Out:
[[1 0 3]
[0 1 0]
[0 0 1]]E1 @ AOut:
array([[ 1, 16, 8],
[-1, 3, 1],
[ 0, 5, 2]])
To multiply k times row i in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except the i,j entry is Ei,i=k.
For example, if A is 3 by 3 and we want to multiply row 1 by -2 then —
Let’s verify the calculation:
E2 = np.array([[1,0,0],[0,-2,0],[0,0,1]])
print(E2)Out:
[[ 1 0 0]
[ 0 -2 0]
[ 0 0 1]]E2 @ AOut:
array([[ 1, 1, 2],
[ 2, -6, -2],
[ 0, 5, 2]])
Finally, to switch row i and row j in a matrix A, we multiply A by the matrix E where E is equal to the identity matrix except Ei,i=0, Ej,j=0, Ei,j=1 and Ej,i=1.
For example, if A is 3 by 3 and we want to switch row 1 and row 2 then —
Let’s verify the calculation:
E3 = np.array([[1,0,0],[0,0,1],[0,1,0]])
print(E3)Out:
[[1 0 0]
[0 0 1]
[0 1 0]]E3 @ AOut:
array([[ 1, 1, 2],
[ 0, 5, 2],
[-1, 3, 1]])
Implementation
Let’s now write functions to implement the elementary row operations. First of all, let’s write a function called add_rows
which takes input parameters A, k, i and j and returns the NumPy array resulting from adding k times row j to row i in the matrix A.
If i=j, then let’s say that the function scales row i by k+1 since this would be the result of k times row i added to row i.
def add_row(A,k,i,j):
"Add k times row j to row i in matrix A."
n = A.shape[0]
E = np.eye(n)
if i == j:
E[i,i] = k + 1
else:
E[i,j] = k
return E @ A
Let’s test our function:
M = np.array([[1,1],[3,2]])
print(M)Out:
[[1 1]
[3 2]]add_row(M,2,0,1)Out:
array([[7., 5.],
[3., 2.]])
Let’s write a function called scale_row
which takes 3 input parameters A, k, and i and returns the matrix that results from k times row i in the matrix A.
def scale_row(A,k,i):
"Multiply row i by k in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = k
return E @ A
Let’s test our function:
M = np.array([[3,1],[-2,7]])
print(M)Out:
[[ 3 1]
[-2 7]]scale_row(M,3,1)Out:
array([[ 3., 1.],
[-6., 21.]])
Let’s write a function called switch_rows
which takes 3 input parameters A, i and j and returns the matrix that results from switching rows i and j in the matrix A.
def switch_rows(A,i,j):
"Switch rows i and j in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = 0
E[j,j] = 0
E[i,j] = 1
E[j,i] = 1
return E @ A
Let’s test our function:
A = np.array([[1,1,1],[1,-1,0]])
print(A)Out:
[[ 1 1 1]
[ 1 -1 0]]switch_rows(A,0,1)Out:
array([[ 1., -1., 0.],
[ 1., 1., 1.]])
Examples
Find the Inverse
Let’s apply our functions to the augmented matrix [M | I] to find the inverse of the matrix M:
M = np.array([[5,4,2],[-1,2,1],[1,1,1]])
print(M)Out:
[[ 5 4 2]
[-1 2 1]
[ 1 1 1]]A = np.hstack([M,np.eye(3)])
print(A)Out:
[[ 5. 4. 2. 1. 0. 0.]
[-1. 2. 1. 0. 1. 0.]
[ 1. 1. 1. 0. 0. 1.]]A1 = switch_rows(A,0,2)
print(A1)Out:
[[ 1. 1. 1. 0. 0. 1.]
[-1. 2. 1. 0. 1. 0.]
[ 5. 4. 2. 1. 0. 0.]]A2 = add_row(A1,1,1,0)
print(A2)Out:
[[1. 1. 1. 0. 0. 1.]
[0. 3. 2. 0. 1. 1.]
[5. 4. 2. 1. 0. 0.]]A3 = add_row(A2,-5,2,0)
print(A3)Out:
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 3. 2. 0. 1. 1.]
[ 0. -1. -3. 1. 0. -5.]]A4 = switch_rows(A3,1,2)
print(A4)Out:
[[ 1. 1. 1. 0. 0. 1.]
[ 0. -1. -3. 1. 0. -5.]
[ 0. 3. 2. 0. 1. 1.]]A5 = scale_row(A4,-1,1)
print(A5)Out:
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 1. 3. -1. 0. 5.]
[ 0. 3. 2. 0. 1. 1.]]A6 = add_row(A5,-3,2,1)
print(A6)Out:
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 1. 3. -1. 0. 5.]
[ 0. 0. -7. 3. 1. -14.]]A7 = scale_row(A6,-1/7,2)
print(A7)Out:
[[ 1. 1. 1. 0. 0. 1. ]
[ 0. 1. 3. -1. 0. 5. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]A8 = add_row(A7,-3,1,2)
print(A8)Out:
[[ 1. 1. 1. 0. 0. 1. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]A9 = add_row(A8,-1,0,2)
print(A9)Out:
[[ 1. 1. 0. 0.42857143 0.14285714 -1. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]A10 = add_row(A9,-1,0,1)
print(A10)Out:
[[ 1. 0. 0. 0.14285714 -0.28571429 0. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]
Let’s verify if we found M inverse correctly:
Minv = A10[:,3:]
print(Minv)Out:
[[ 0.14285714 -0.28571429 0. ]
[ 0.28571429 0.42857143 -1. ]
[-0.42857143 -0.14285714 2. ]]result = Minv @ M
print(result)Out:
[[ 1.00000000e+00 4.44089210e-16 2.22044605e-16]
[-6.66133815e-16 1.00000000e+00 -2.22044605e-16]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Success! We can see the result more clearly if we round to 15 decimal places:
np.round(result,15)Out:
array([[ 1.e+00, 0.e+00, 0.e+00],
[-1.e-15, 1.e+00, -0.e+00],
[ 0.e+00, 0.e+00, 1.e+00]])
Solve a System
Let’s use our functions to perform Gaussian elimination and solve a linear system of equations Ax=b.
A = np.array([[6,15,1],[8,7,12],[2,7,8]])
print(A)Out:
[[ 6 15 1]
[ 8 7 12]
[ 2 7 8]]b = np.array([[2],[14],[10]])
print(b)Out:
[[ 2]
[14]
[10]]
Form the augmented matrix M:
M = np.hstack([A,b])
print(M)[[ 6 15 1 2]
[ 8 7 12 14]
[ 2 7 8 10]]
Perform row operations:
M1 = scale_row(M,1/6,0)
print(M1)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 8. 7. 12. 14. ]
[ 2. 7. 8. 10. ]]M2 = add_row(M1,-8,1,0)
print(M2)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. -13. 10.66666667 11.33333333]
[ 2. 7. 8. 10. ]]M3 = add_row(M2,-2,2,0)
print(M3)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. -13. 10.66666667 11.33333333]
[ 0. 2. 7.66666667 9.33333333]]M4 = scale_row(M3,-1/13,1)
print(M4)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 2. 7.66666667 9.33333333]]M5 = add_row(M4,-2,2,1)
print(M5)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 0. 9.30769231 11.07692308]]M6 = scale_row(M5,1/M5[2,2],2)
print(M6)Out:
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 0. 1. 1.19008264]]M7 = add_row(M6,-M6[1,2],1,2)
print(M7)Out:
[[1. 2.5 0.16666667 0.33333333]
[0. 1. 0. 0.1046832 ]
[0. 0. 1. 1.19008264]]M8 = add_row(M7,-M7[0,2],0,2)
print(M8)Out:
[[1. 2.5 0. 0.13498623]
[0. 1. 0. 0.1046832 ]
[0. 0. 1. 1.19008264]]M9 = add_row(M8,-M8[0,1],0,1)
print(M9)Out:
[[ 1. 0. 0. -0.12672176]
[ 0. 1. 0. 0.1046832 ]
[ 0. 0. 1. 1.19008264]]
Success! The solution of Ax=b is
x = M9[:,3].reshape(3,1)
print(x)Out:
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]
Or, we can do it the easy way…
x = la.solve(A,b)
print(x)Out:
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]
scipy.linalg.solve
We are mostly interested in linear systems Ax=b where there is a unique solution x. This is the case when A is a square matrix (m=n) and det(A)≠0. To solve such a system, we can use the function scipy.linalg.solve
.
The function returns a solution of the system of equations Ax=b. For example:
A = np.array([[1,1],[1,-1]])
print(A)Out:
[[ 1 1]
[ 1 -1]]b1 = np.array([2,0])
print(b1)Out:
[2 0]
And solve:
x1 = la.solve(A,b1)
print(x1)Out:
[1. 1.]
Note that the output x is returned as a 1D Numpy array when the vector b (the right-hand side) is entered as a 1D Numpy array. If we input b as a 2D Numpy array, then the output is a 2D Numpy array.
For example:
A = np.array([[1,1],[1,-1]])
b2 = np.array([2,0]).reshape(2,1)
x2 = la.solve(A,b2)
print(x2)Out:
[[1.]
[1.]]
Finally, if the right-hand side b is a matrix, then the output is a matrix of the same size. It is the solution of Ax=b when b is a matrix.
For example:
A = np.array([[1,1],[1,-1]])
b3 = np.array([[2,2],[0,1]])
x3 = la.solve(A,b3)
print(x3)Out:
[[1. 1.5]
[1. 0.5]]
Simple Example
Let’s compute the solution of the system of equations
Create the matrix of coefficients:
A = np.array([[2,1],[1,1]])
print(A)Out:
[[2 1]
[1 1]]
And the vector b:
b = np.array([1,-1]).reshape(2,1)
print(b)Out:
[[ 1]
[-1]]
And solve:
x = la.solve(A,b)
print(x)Out:
[[ 2.]
[-3.]]
We can verify the solution by computing the inverse of A:
Ainv = la.inv(A)
print(Ainv)Out:
[[ 1. -1.]
[-1. 2.]]
And multiply A inverse and b to solve for x:
x = Ainv @ b
print(x)Out:
[[ 2.]
[-3.]]
We get the same result. Success!
Inverse or Solve
It’s a bad idea to use A inverse to solve Ax=b if A is large. It’s too computationally expensive.
Let’s create a large random matrix A and vector b and compute the solution x in 2 ways —
N = 1000
A = np.random.rand(N,N)
b = np.random.rand(N,1)
Check the first entries A:
A[:3,:3]Out:
array([[0.35754719, 0.63135432, 0.6572258 ],
[0.18450506, 0.14639832, 0.23528745],
[0.27576474, 0.46264005, 0.26589724]])
And for b:
b[:4,:]Out:
array([[0.82726751],
[0.96946096],
[0.31351176],
[0.63757837]])
Now we compare the speed of scipy.linalg.solve
with scipy.linalg.inv
:
%%timeit
x = la.solve(A,b)Out:
2.77 s ± 509 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%%timeit
x = la.inv(A) @ bOut:
4.46 s ± 2.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solving with scipy.linalg.solve
is about twice as fast!
That’s all for Part-1! I know that’s a lot to take in at once! But you made it until the end! Kudos on that! Do not forget to check out the other parts of this article!
There are a lot of other good resources if you’re still interested in getting the most out of this topic —
For the complete implementation, do check out my GitHub Repository —
To contact, or for further queries, feel free to drop a mail at — tp6145@bennett.edu.in