Machine Learning: A brief Overview
Machine Learning (ML) uses algorithms to analyze data. ML learns from this data and makes informed decisions. It consists of a collection of mathematical methods of pattern recognition which get better at decision making as they are exposed to more data over time. Almost all of the data in the real world can be classified into two main categories — Structured Data and Unstructured Data. Structured data can be data records from a database or Excel tables. The individual data fields have a purpose and a structure. While Unstructured data can take the form of text, images, videos, or music. When provided with structured data, classic ML algorithms look for hidden information within the input data. However, classic ML methods (decision trees, support vector machines) are not able to process unstructured data in a meaningful way. For example, images cannot simply be used as input data to train an algorithm for image classification. Thus, Feature Engineering — the field concerned with developing features and descriptors to best perform on other ML tasks — would always have to be carried out by a person.
Is there a way to bypass Feature Engineering altogether?
Deep Learning and Representation Learning
Over recent years, the field of Deep Learning (DL) has grown exponentially in popularity and witnessed significant advances largely due to better accessibility of high computing power and big data combined with novel algorithms. One of the major applications of DL has been in the area of Representation Learning (a.k.a Feature Learning) — a sub-field of Machine Learning (ML) that is concerned with algorithms that find the best data representation for a task without external intervention. It is a direct replacement for the previously discussed Feature Engineering.
A renowned example of this is the prevalent use of Deep Convolutional Neural Networks (CNNs) for tasks like image classification, resulting in better performance benchmarks against the conventional ML algorithms. Prior to the deep CNN era, the process to perform image classification generally consisted of two stages. One would extract hand-crafted features from the image, and then classify the image based on those features.
But with deep CNNs, there came a paradigm shift towards End-to-End learning — learning the underlying representation of the input data in an entirely data-driven way (without specialist input). “Deep” in Deep Learning (DL) refers to the number of consecutive layers used within a neural network. Data representations learned by networks in an End-to-End way rely on similar aspects as when done by specialists — identifying primitive features (edges, corners) in shallower layers, and more specific compositional features in deeper layers.
The proof of concept was demonstrated in a general context by Krizhevsky et al in 2012. In 2013, the deep CNNs outperformed the Feature Engineering based State-of-the-Art (SOTA) models — “SIFT+FVs” (see image below).