It’s key to speak Linear Algebra language, which allows you to better communicate with Linear Algebra experts and understand more advanced concepts. There’s a list of terminologies that I extracted from Prof. Gilbert Strang MIT Open CourseWare and Deep learning book written by Dr. Ian Goodfellow et al. I will provide a brief explanation for all the concepts and optionally I will introduce related Numpy function so you can play with while reading, but I would also like to refer you to Wikipedia whenever you want to know more about them.

Vector space, subspace, span, column space, row space, null space, left-null space, rank, basis, orthogonal matrix, symmetric matrix

Imagine we have a matrix A, in python, A.T means transpose of A, @ means matrix multiplication.

a = np.random.randn(2,3)

arr

# array([[ 0.39, 0.54, -0.71],

[-1.84, -0.6 , 0.53]])

a.T

# array([[ 0.39, -1.84],

[ 0.54, -0.6 ],

[-0.71, 0.53]])b = np.random.randn(3,2)

b

# array([[ 1., -0.],

[ 1., -0.],

[-3., 1.]])a @ b

# array([[-1.09762048, 0.96911979],

[-1.97859822, -3.15604207]])

The ** span** of a set of vectors is all linear combinations of these vectors. Think about vector (0,1) and (1,0), a span of these two vectors would be the whole x-y plane.

Then thinking about a x-y-z 3D space (** Vector space** R3), it is composed of

**— (1,0,0), (0,1,0), and (0,0,1), since every element in Vector space can be written as a linear combination of the elements in the basis.**

*basis*Span of Vector (1,0,0) and (0,1,0) constitutes a ** subspace** of x-y-z 3D vector space R3. (A subset of a larger vector space)

** Column space** of A is the span of all column vectors of A,

**of A**

*row space***is the span of all row vectors of A. If A @ x=0, the span of all solutions x constitutes the**

**of A. If A.T @ x= 0, span of all solutions x constitutes the**

*null space***of A.**

*left-null space*How many linearly independent column vectors in matrix A? This is ** rank** of A. (1,2,3) and (10,20,30) are linear dependent! How to compute the rank of a matrix?

`x = np.random.rand(2,3)`

np.linalg.matrix_rank(x)

# 2

# So, x has two linearly independent column vectors

Two vectors’ inner products are 0 and they are all of the unit lengths, then they are ** orthonormal vectors**. If matrix A’s rows and columns are orthonormal vectors, this matrix A is an

**. In math, it means A.T @ A = A @ A.T = I.**

*orthogonal matrix*** Symmetric matrix** means A = A.T, also, symmetric matrix and orthogonal matrix only apply to real value matrix, in a complex matrix, we have the counterparts, namely

**and**

*hermitian matrix***We will cover them in later part**

*unitary matrix*,**.**

(Make sure you understand: √√√√, if not, google them on Wikipedia)**.**

Inverse, determinant

You can compute the ** determinant** of any square matrix.

`x = np.random.rand(4,4)`

np.linalg.det(x)

# 0.08437697073565216

Only invertible (determinant != 0) square matrix will have inverse. If A @ B = I, B is A’s ** inverse matrix**.

Gaussian Elimination, row echelon form, reduced echelon form, leading coefficient, pivot, elementary row operation

** Gaussian Elimination** can be used in (a) solving a linear system Ax=b, (b) compute inverse matrix (c) solve rank (d) solve determinant (details please refer to Wikipedia), via a series of

**including (1) swap rows, (2) scale rows, (3) Add one row to another.**

*elementary row operations*** Row echelon form**: the first non-zero element from the left in each row (

**,**

*leading coefficient***) is always to the right of the first non-zero element in the row above.**

*pivot*** Reduced row echelon form**: row echelon form whose pivots are1 and column containing pivots are zero elsewhere.

Inner product, outer product, hadamard product, Projection, Gram-Schmidt process

First, let’s understand the inner product, outer product, Hadamard product of two 1D arrays:

`a = np.array([4,5,6])`

b = np.array([7,8,9])

np.inner(a,b). # 122

np.outer(a,b). # array([[28, 32, 36],

[35, 40, 45],

[42, 48, 54]])

a * b. # element wise or hadamard product

# array([28, 40, 54])

How to project one vector (a) to another vector (b)?

`a = np.array([4,5,6])`

b = np.array([7,8,9])

proj_b_a = np.inner(a,b) / np.inner(b,b) * b

# array([4.40206186, 5.03092784, 5.65979381])

** The gram-Schmidt process** is to orthonormalize a set of vectors (v1,v2,v3…vn) which spans Rn vector space to (u1,u2,u3…un) in the same Rn vector space but within which each element is of unit length and are mutually orthogonal. For more detail and a geometric explanation please refer to Wikipedia.

LU decomposition, QR decomposition.

** LU decomposition** aims to decompose a matrix (no need to be square) to a lower triangular matrix (L, entries above diagonal are 0) and an upper triangular matrix (U, entries above diagonal are 0). It is related to Gaussian decomposition but has better computational efficiency. In order to make LU decomposition materialize, sometimes we reorder the matrix using a P matrix.

a = np.random.randn(3,4)

p,l,u = scipy.linalg.lu(a)#p

array([[0., 1., 0.],

[1., 0., 0.],

[0., 0., 1.]])#l

array([[ 1. , 0. , 0. ],

[-0.71426919, 1. , 0. ],

[-0.85039495, 0.12237106, 1. ]])#u

array([[-2.09219711, -0.48959089, 0.81707073, 0.77602155],

[ 0. , -1.53448255, -1.72785249, 0.04775144],

[ 0. , 0. , 1.5052947 , -0.27769281]])

QR decompostion aims to decompose a matrix to an orthogonal matrix (Q) and an upper triangular matrix (R). It is used in QR algorithms to solve the linear least square problem.

a = np.random.randn(3,4)

q,r = np.linalg.qr(a)#q

array([[-0.47569189, 0.71339517, 0.51457221],

[-0.82969021, -0.55818202, 0.00685511],

[ 0.29211536, -0.42367461, 0.85741964]])

#r

array([[-1.34089268, -1.73408897, -0.07436536, 0.78464807],

[ 0. , -1.66272812, 0.63477604, -1.60036506],

[ 0. , 0. , 0.43098896, -0.31316029]])

Eigen-decomposition, diagonalization, Characteristic polynomial.

I will illustrate Eigen-decomposition using a 3*3 square matrix, it will have 3 eigenvectors and cognate 3 eigenvalues.

ei = np.random.randn(4,4)

w,v = np.linalg.eig(ei)# w. eigenvalues

array([-2.21516912+1.65582705j, -2.21516912-1.65582705j,

1.45929568+0.99548974j, 1.45929568-0.99548974j])

# v. eigenvector, 4*4

array([[-0.1070701 -0.40447805j, -0.1070701 +0.40447805j,

-0.03773179+0.56113399j, -0.03773179-0.56113399j],

[ 0.15294619-0.0172865j , 0.15294619+0.0172865j ,

0.65763758+0.j , 0.65763758-0.j ],

[ 0.12433426+0.47215025j, 0.12433426-0.47215025j,

0.3236955 +0.37453337j, 0.3236955 -0.37453337j],

[-0.75023815+0.j , -0.75023815-0.j ,

0.00770123+0.07813095j, 0.00770123-0.07813095j]])

** Diagonalization** means for a square matrix A, if exists a matrix P such that P-1AP = D, a diagonal matrix, then matrix A is diagonalizable.

A square matrix A (n*n) can have a ** characteristic polynomial**, we are trying to compute:

Moore-Penrose pseudo-inverse, hermitian, conjugate transpose, full-rank matrix

We know only an invertible square matrix has an inverse, how about a non-square matrix? we have** pseudo-inverse**.

** Moore-Penrose** defines the

**(A+,n*m) of a rectangular matrix (A, m*n) if A+ meets four particular conditions:**

*pseudo-inverse*(1) AA+A = A

(2) A+AA+ = A+

(3) (AA+)* = AA+

(4) (A+A)* = A+A

***** means ** conjugate transpose**, for a complex square matrix (meaning the entry may be the complex number), the conjugate transpose is to first transpose the matrix and then conjugate (flip the sign of imaginary part but leave real part untouched) every entry of the matrix.

A ** hermitian matrix** is a complex square matrix whose conjugate transpose is equal to itself. Its real value counterpart is a

**.**

*symmetric matrix*If A is ** full-rank** (rank=min(m,n)) matrix, A+ can be expressed as :

(1) A+ = (A*A)-1A*. A has independent columns

(2) A+ = A*(AA*)-1. A has independent rows

Singular Value Decomposition (SVD), singular matrix, complex unitary matrix

Only the square matrix has eigenvalues, the rectangular matrix can have ** singular values**, a matrix whose determinant = 0 will be a

**, or in another word, it is non-invertible.**

*singular matrix*A complex matrix will be ** unitary** if its conjugate transpose happens to be its inverse. U*U = UU* = I. The real value counterpart is the

**.**

*orthogonal matrix*So a rectangular matrix A can be decomposed into three components:

U, S, and Vh, diagonal of S matrix contain ** singular values** for matrix A. This decomposition is called

**. SVD can be really convenient to compute pseudo-inverse, matrix rank, etc.**

*Singular Value Decomposition (SVD)*svd = np.random.rand(4,5)

u,s,vh = np.linalg.svd(svd)#u. shape (4,4)

array([[ 0.66581103, 0.61992005, -0.21849383, -0.3530655 ],

[ 0.47769768, 0.06749606, 0.56339472, 0.67069784],

[ 0.44673911, -0.66240419, 0.31234136, -0.51389467],

[ 0.35906096, -0.41516755, -0.73300048, 0.40177286]])#s. s is the np.diag(S), S will be of shape (4,5)

array([2.35262119, 0.87561858, 0.31537598, 0.043907 ])# vh of shape (5,5)

array([[ 0.29743919, 0.569923 , 0.3576598 , 0.52216343, 0.43144237],

[-0.61102456, 0.15085062, 0.50163189, -0.46496796, 0.36886763],

[ 0.49893385, 0.39660661, -0.29349238, -0.68317228, 0.2022525 ],

[ 0.52538762, -0.60314147, 0.5557502 , -0.15677528, 0.16355866],

[-0.11494248, -0.3624299 , -0.47481454, 0.14088043, 0.78111244]])

Lp Norm, Frobenius norm, einsum

What are L1 and L2 norms? Basically, it represents different metrics to define a vector’s distance from the origin. Let’s define a more general Lp norm:

For a matrix, we usually encounter ** Frobenius norm** like above.

a = np.array([4,5,6])

np.linalg.norm(a,ord=3). # L3 norm for vector a

# 7.398636222991409x = np.random.rand(2,3)

np.linalg.norm(x,ord='fro'). # frobenius norm for matrix x

# 1.309085506183174

Finally, I’d like to introduce the ** Einstein Sum function** in the Numpy package. This is a somewhat magic function that provides a generic solution for nearly all the basic matrix operations you can encounter, including, inner product, outer product, matrix multiplication, trace, diagonal, Hadamard product, summation, row sum, column sum, transpose.

Basically, this function requires two arguments, first is called “subscript”, second is called “operands”, let us see a matrix multiplication example to understand that:

x = np.random.rand(2,3)

y = np.random.rand(5,3)

np.einsum('ij,kj -> ik',x,y)#

array([[0.3676856 , 0.33156855, 0.39793874, 0.70939856, 0.8353566 ],

[0.19219345, 0.19312881, 1.36739081, 1.0606612 ,1.15039307]])

As we can see, we use `i`

and `j`

represent the dimensions of the operand `x`

, `k`

and `j`

to represent dimensions of the operand `y`

. And we specify the output matrix to be in the dimension `i`

* `k`

. The `einsum`

function will automatically figure out it is a matrix multiplication problem. With that understood, let’s delve into all the interesting examples `einsum`

function can perform. **A basic rule is that the dimension that only happens in input subscripts but not in output subscript means this dimension needs to be summed**.

`# einsum`

x = np.random.rand(2,3)

np.einsum('ij -> ji',x) # transpose

np.einsum('ij ->',x) # sum

np.einsum('ij -> i',x) # column sum

np.einsum('ij -> j',x) # row sumx = np.random.rand(2,3)

y = np.random.rand(5,3)

np.einsum('ij,kj -> ik',x,y) # matrix multiplication

a = np.array([4,5,6])

b = np.array([7,8,9])

np.einsum('i,i ->',a,b) # inner product

np.einsum('i,j ->ij',a,b) # outer product

np.einsum('i,i ->i',a,b) # hadamard product

y = np.random.rand(5,3)

np.einsum('ij -> j',y) # diagonal

np.einsum('ij ->',y) # trace

Not everyone needs to know Linear Algebra, and to what extent you should understand Linear Algebra heavily depends on your type of work. But it is reasonable to forecast that machine learning and other statistical models will become more and more accessible and by then, simply knowing how to run a machine learning model might not be sufficient to make you stand out. So I would suggest to equip yourself with the necessary knowledge from now and make sure you always be competitive in the foreseeable future.

Thanks for reading! Next time I will cover the statistical model in Python. If you like this article, follow me on medium, thank you so much for your support. Connect me on my Twitter or LinkedIn, also please let me know if you have any questions or what kind of NumPy tutorials you would like to see In the future!

Github Repository: