## Learn about scalars, vectors, and the dot product.

Machines only understand numbers. For instance, if you want to create a spam detector, you have first to convert your text data into numbers (for instance, through *word embeddings*). Data can then be stored in vectors, matrices, and tensors. For instance, images are represented as matrices of values between 0 and 255 representing the luminosity of each color for each pixel. It is possible to leverage the tools and concepts from the field of linear algebra to manipulate these vectors, matrices, and tensors.

Linear algebra is the branch of mathematics that studies *vector spaces*. You’ll see how vectors constitute vector spaces and how linear algebra applies linear transformations to these spaces. You’ll also learn the powerful relationship between sets of linear equations and vector equations, related to important data science concepts like *least squares approximation*. You’ll finally learn important matrix decomposition methods: *eigendecomposition* and *Singular Value Decomposition* (SVD), important to understand unsupervised learning methods like *Principal Component Analysis* (PCA).

Linear algebra deals with *vectors*. Other mathematical entities in the field can be defined by their relationship to vectors: *scalars*, for example, are single numbers that *scale* vectors (stretching or contracting) when they are multiplied by them.

However, vectors refer to various concepts according to the field they are used in. In the context of data science, they are a way to store values from your data. For instance, take the height and weight of people: since they are distinct values with different meanings, you need to store them separately, for instance using two vectors. You can then do operations on vectors to manipulate these features without losing the fact that the values correspond to different attributes.

You can also use vectors to store data samples, for instance, store the height of ten people as a vector containing ten values.

We’ll use lowercase, boldface letters to name vectors (such as ** v**). As usual, refer to the Appendix Essential Math for Data Science to have the summary of the notations used in this book.

The word *vector* can refer to multiple concepts. Let’s learn more about geometric and coordinate vectors.

*Coordinates* are values describing a position. For instance, any position on earth can be specified by geographical coordinates (latitude, longitude, and elevation).

*Geometric vectors*, also called *Euclidean vectors*, are mathematical objects defined by their magnitude (the length) and their direction. These properties allow you to describe the displacement from a location to another.

For instance, Figure 1 shows that the point *A* has coordinates (1, 1) and the point *B* has coordinates (3, 2). The geometric vector ** v** describes the displacement from

*A*to

*B*, but since vectors are defined by their magnitude and direction, you can also represent

**as starting from the origin.**

*v***Cartesian Plane**

In Figure 1, we used a coordinate system called the *Cartesian plane*. The horizontal and vertical lines are the *coordinate axes*, usually labeled respectively *x* and *y*. The intersection of the two coordinates is called the *origin* and corresponds to the coordinate 0 for each axis.

In a Cartesian plane, any position can be specified by the *x* and the *y* coordinates. The Cartesian coordinate system can be extended to more dimensions: the position of a point in a *n*-dimensional space is specified by *n* coordinates. The real coordinate *n*-dimensional space, containing *n*-tuples of real numbers, is named ℝ*ⁿ *. For instance, the space ℝ*²* is the two-dimensional space containing pairs of real numbers (the coordinates). In three dimensions (ℝ*³*), a point in space is represented by three real numbers.

*Coordinate vectors* are ordered lists of numbers corresponding to the vector coordinates. Since vector initial points are at the origin, you need to encode only the coordinates of the terminal point.

For instance, let’s take the vector ** v** represented in Figure 2. The corresponding coordinate vector is as follows: