
Let’s take a look at the two main approaches to reducing dimensionality.
Projection🧑🎨
In most real-world problems, training instances are not spread out uniformly across all dimensions. Many features are almost constant, while others are highly correlated.🧑💻
As a result, all training instances actually lie within a much lower-dimensional subspace of the high-dimensional space. This sounds very abstract. Above image is the example of it.🕵️♂️
Notice that all training instances lie close to a plane this is a lower-👨⚕ ️dimensional subspace of the high-dimensional space. Now if we project every training instance perpendicularly onto this subspace, we get the new 2D dataset Ta-da! we just have reduced the dataset’s dimensionality from 3D to 2D.
However, projection is not always the best approach to dimensionality 👷♂️ reduction. In many cases the subspace may twist and turn, such as in the famous Swiss roll toy dataset represented Below.
Simply projecting onto a plane would squash different layers of the Swiss roll together. However,🧑🚀 what you really want is to unroll the Swiss roll to obtain the 2D dataset on the right of below image.
(right)
Mainfold Learning 🧑⚖️
The Swiss roll is an example of a 2D mainfold. Put simply, a 2D mainfold is a 2D shape that can be bent and twisted in a higher-dimensional space. More generally, a d-dimensional mainfold is a part of an n-dimensional space 🧔(when d<n) that locally resembles a d-dimensional hyperplane. In the case of the Swiss roll, d=2 and n=3: it locally resembles a 2D plane, but it is rolled in the third dimension.
Many dimensionality reduction algorithm work by modeling the mainfold on which he training instances lie; this is called Mainfold Learning. It relies on the mainfold assumption, also called the mainfold hypothesis, which holds hat most real-world high-dimensional datasets lie close to a much lower-👩⚕️ dimensional mainfold.This assumption is very often empirically observed.
Once again, think about the MNIST dataset: all handwritten digit images have some similarities. They are made of connected lines, the borders are white, they are more or less centered, and so on.
If you randomly generated images, only a ridiculously tiny fraction of them would look like handwritten digits. In other words, the degrees of freedom 👩🏭 available to you if you were allowed to generate any image you wanted.These constraints tend to squeeze the dataset into a lower-dimensional mainfold.
The mainfold assumption is often accompanied by another implicit 🕵️♀ ️assumption: that the task at hand will be simpler f expressed in the lower-dimensional space of the mainfold.
For example, in the top row of this Swiss roll is split into two classes: in the 3D space (on the left), the decision boundary would be fairly complex, but in the 2D unrolled mainfold space, the decision boundary is a simple straight line.
However, this assumption does not always hold. For example, in the bottom row of the decision boundary is located at x1=5. This decision boundary looks very simple in the original 3D space, but it looks more complex in the unrolled mainfold.🧑🍳
In short, if you reduce the dimensionality of your training set before training a model, it will usually speed up training, but it may not always lead to a better or simpler solution; it all depends on the dataset.🧑⚕️
Hopefully, you now have a good sense of what the curse of dimensionality 👨💻 is and how dimensionality reduction algorithms can fight it, especially when the mainfold assumption holds.