In many ways, deep learning has brought upon a new age of descriptive, predictive, and generative modeling to many dozens of industries and research areas. In light of it’s usefulness its also found a wealth of popularity

With popularity often comes simplification, and the most common simplification is that the power of deep learning is purely in the ability to predict. While deep networks have definitely reached new highs in accuracy of predicting complex phenomena, the situation is not nearly so simple. Before we can predict something, we must understand it. While understanding is tough to measure and even tougher to describe — deep learning models often understand something via a simple process: projecting the inputs into a vector space where all the relevant components for solving the problem at hand are represented and well separated.

The goal of this article will be to explore what this vector space looks like for different models and build a tool that will allow us to take any deep learning model and visualize it’s vector space using Tensorboard’s Embedding Projector, TensorboardX, and Pytorch.

**All of the code for this guide is available on the Github repo **here****

## Rethinking Deep Learning Models

Before we dive further into the structure of this vector space, it will be useful to think of deep learning models as consisting of a backbone, a projector, and a head.

The backbone is the portion of the network that extracts the features (such faces in an image, groups of subject mentions in text, particular words in a sound clip) — often into a high dimensional vector. The role of the projector then is to “filter” that high dimensional vector to suppress features unimportant to the reducing the loss and find combinations of features that are useful. Finally, the “head” of the network uses the refined feature vector to make predictions — usually these predictions are a vector representing the probability of different classes or outcomes.

Often, models trained on huge datasets are fine-tuned by freezing the weights in the backbone and training the projector and head (sometimes these are combined and just called the head of the network as a whole). The idea here is that all the important features for a wide array of tasks is already encoded through the backbone, and we simply need to find good combinations of those features to help perform our task. This falls under the umbrella of “Transfer Learning”.

Today, we will be mapping out these embeddings to understand what kind of vector space they live in, and how different data is separated within it.

## Setting up the Data

The wonderful Torchvision package provides us a wide array of pre-trained deep learning models and datasets to play with. These pre-trained models are documented well, with well defined pre-processing steps and architectural hyper-parameters. The datasets are easy to use and help us bypass formatting and writing custom dataloaders. We’re going to start out experiments with the simple MNIST dataset, accessible via the MNIST dataloader in torchvision: