Fastbook Chapter 2

Welcome to my first Medium article. Today, I am going to be reflecting on what I learned from the second chapter of Jeremy Howard and Sylvain Gugger’s book, Deep Learning for Coders with fastai & PyTorch. The reason I am starting this blog in the midst of reading the second chapter is because I was unaware of the value behind writing a blog. Rachel Thomas, cofounder of fastai, states:

The top advice I would give my younger self would be to start blogging sooner.

Yes, I had no inkling that blogs could be used as a means of learning complex subjects like deep learning. So while it may seem like I know what I am talking about, I am merely learning right along with you.

This post will not contain many coding examples, but that will certainly change come future chapters.

Questionnaire:

1: Where do text models currently have a major deficiency?

While text models can handily generate context-appropriate text, they still struggle with producing correct responses. In other words, the factual accuracy of the generated text is often way off.

2: What are possible negative societal implications of text generation models?

If text generation models are ever able to generate text that is both contextually-appropriate and highly compelling, they would be abused to spew onslaughts of disinformation (“fake news”) across all social media platforms.

3: In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

Sometimes, human intervention is perfectly appropriate. Perhaps the model’s predictions could be reviewed by human experts, in which they would evaluate the data to determine the next best step. For example, a machine learning model for identifying strokes in CT scans can alert high priority cases for expedited review.

4: What kind of tabular data is deep learning particularly good at?

Deep learning is particularly good at analyzing tabular data comprised of natural language.

5: What’s a key downside of directly using a deep learning model for recommendation systems?

Machine learning approaches for recommendation systems often only tell what products a user might like, instead of actually providing recommendations that would be helpful to the user. For example, a system may recommend products a user has already purchased.

6: What are the steps of the Drivetrain Approach?

First, define the objective by thinking about the desired outcome. Then, man the levers by considering the inputs that can be controlled. Third, collect the data… and finally, test the models and examine how the levers influence the objective.

7: How do the steps of the Drivetrain Approach map to a recommendation system?

The objective of a recommendation system is to drive additional sales by surprisingly customers with delightful recommendations of items that they would not have otherwise purchased. Hopefully, they become hooked to said product(s). The lever is considering how to derive and rank each of the possible recommendations. In addition, new data must constantly be collected in order to generate recommendations that will result in new, additional sales. This will require conducting many randomized experiments in order to optimize a wide range of recommendations for a wide range of customers.

8: Create an image recognition model using data you curate, and deploy it on the web.

In addition to following along with the example bear classifier over the course of this chapter, I made my own toucan classifier. Though, the only image classifier I have deployed on the web is a watch classifier (capable of classifying 13 different brands of watches). I used the fastai library, but the code was slightly different from that of this chapter. For the full story, check out my GitHub.

9: What is DataLoaders?

The DataLoaders class stores the required DataLoader objects (usually for train and validation sets) and passes the data to the fastai model.

10: What four things do we need to tell fastai to create DataLoaders?

We must specify the type(s) of data we are working with, how and where to get this list of items, how to label these items, and how to create the validation set from the data.

11: What does the splitter parameter to DataBlock do?

One must provide the splitter argument to direct fastai on how to split up the dataset into subsets (usually into a train a validation set).

12: How do we ensure a random split always gives the same validation set?

To my surprise (and maybe yours), it is impossible for our computers to generate truly random numbers. They use a pseudo-random generator, which can actually be controlled using a random seed. By setting a random seed value, the pseudo-random generator will generate the same “random” in a fixed manner each run. Therefore, we can use the random seed to generate a random split that always gives the same validation set.

13: What letters are often used to signify the independent and dependent variables?

Well, I know this one from math class: x = IV | y = DP.

14: What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the other?

crop = default Resize() method → crops the images to fit a square shape of the requested size. This can result in losing important details (… they were cropped out).
pad = alternative Resize() method → pads the image’s pixel matrix with zeros (which show up as black). These black voids translate to a lot of empty space and wasted model computation. Not to mention, this results in worse resolution for the part of the image we actually use and want to evaluate.
squish = alternative Resize() method → either squishes or stretches the image. This creates unrealistic proportions, leading to a more confused model (model has less of a grasp on the realistic dimensions of the image subject). When feeding the model real-world images, this can result in lower overall accuracy.
Out of these three resizing methods, there is no right answer as it really depends on the problem and dataset. For example, if the features in the dataset take up the whole image, then using crop would not be a good idea as such would result in the loss of significant information… in which case go with pad or squish.
My favorite method I have utilized thus far has been RandomResizedCrop, which randomly selects a portion of image to crop. At every epoch, the model will see a different part of the image and learn accordingly. This is a very thorough and widely used approach.

15: What is data augmentation? Why is it needed?

Data augmentation involves creating random variations of the input data, in which they appear different, but not so different in which the meaning of the data is altered. If we are using images as our input data, we can flip, rotate, warp, and/or adjust their brightness levels. Data augmentation allows the model to gain a better understanding of the basic concept behind an object… and therefore generalize better. Also, when labeling data/acquiring large amounts of data gets expensive, data augmentation can generate a bunch of new, useful data, and therefore save a lot of time, energy, and money.

16: Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

I believe the bear classification model would perform horribly on the following sets of images as they were not represented in the training data: images where the bear is partially obstructed, nighttime images, low-resolution images, and images where the bear is small and off to the side.

17: What is the difference between item_tfms and batch_tfms?

tfms = transformations
item_tfms are transformations applied to a single data sample x on the CPU. Resize() is a common transform because the mini-batch of input images to a CNN must have the same dimensions. Assuming the images are RGB (3 color channels), then Resize() as item_tfms will mold the images to all have the same width and height.
batch_tfms are transformations applied to batched data samples on the GPU, which are faster and more efficient than item_tfms. A good example of these are the ones provided by aug_transforms().

18: What is a confusion matrix?

A confusion matrix is a representation of the model’s predictions vs. the actual, correct labels. The rows of the matrix represent the labels and the columns represent the predictions. Therefore, the number of images in the diagonal elements depict the number of correctly classified images, while the off-diagonal elements are the incorrectly classified images. Class confusion matrices provide useful insights to how well a model is performing… whether the model is confused or not.

19: What does export save?

This function, export, saves the model architecture, the corresponding trained parameters of the neural network, and how the DataLoaders are defined.

20: What is it called when we use a model for getting predictions, instead of training?

21: Why are IPython widgets?

IPython widgets are a combination of Python and JavaScript functionalities that allow us to build interactive GUI components directly in Jupyter Notebook. For example, the Python function widgets.FileUpload() creates an upload button.

22: When might you want to use CPU for deployment? When might a GPU be better?

Use a GPU for performing identical work in parallel. On the other hand, if you are analyzing single pieces of data at a time (like a single image or sentence), then using a CPU would be more cost effective.

23: What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?

The application running on a server will require network connection and face extra network latency when submitting input and returning results.

24: What are three examples of problems that could occur when rolling out a bear warning system in practice?

As previously discussed, the model would face issues when handling nighttime and low-resolution images. In addition, the model’s predictions could be returned at too slow of a rate to be useful.

25: What is out-of-domain data?

Out-of-domain data is fundamentally different in some aspect in comparison to the model’s training data. For example, if an object detector that was trained exclusively with outside daytime photo receives an outside nighttime photo, this piece of data would be considered out-of-domain.

26: What is a domain shift?

A domain shift occurs when the type of data changes gradually over time. For example, an insurance company using a deep learning model as part of their pricing algorithm could fall victim to a domain shift. Over time, their customers will likely be different, therefore, the original training data will not always be representative of their current data. In this case, their deep learning model will not be as effective as it would be seeing out-of-domain data.

27: What are the three steps in the deployment process?

The first step is manual process, in which the model is run in parallel with human supervision and not directly driving any actions. The second step is limited scope deployment, in which the model’s scope is limited and carefully supervised. The third and final step is gradual expansion, in which the model’s scope is gradually increased and human supervision is gradually decreased.

Well, there you have it. Thank you for reading my first medium article. Again, I highly recommend you grab a copy of this book, but nonetheless, I will be pumping out as much Fastbook-related material now that I am on winter break. See you in the next chapter.

Questionnaire:

Footer