## Exploring the MovieLens 100k dataset with SGD, autograd, and the surprise package.

By Gavin Smith and XuanKhanh Nguyen

This project was the third project for my machine learning class this semester. The project aims to train a machine learning algorithm using MovieLens 100k dataset for movie recommendation by optimizing the model’s predictive power. We were given a clean preprocessed version of the MovieLens 100k dataset with 943 users’ ratings of 1682 movies. The input to our prediction system is a (user id, movie id) pair. Our predictor’s output will be a scalar rating y in range (1,5) — a rating of 1 is the worst possible, a rating of 5 is the best. Our main task is to predict the ratings of all user-movie pairs. The recommendation system is performed using four different models.

- Problem 1 follows the use of the Simple Baseline Model with SGD and Autograd. We take into account the simplest possible baseline model — a model that makes the same scalar prediction for a movie’s rating no matter what user or movie is considered.
- Problem 2 follows the use of the One-Scalar-Per-Item Baseline with SGD and Autograd.
- Problem 3 follows the use of the One-Vector-Per-Item Collaborative Filtering with SGD and Autograd.
- Problem 4 is open-ended. We were given a dataset where ratings have been omitted.

For all problems, our task is to obtain the best possible prediction results on the validation set and the test set regarding Mean Absolute Error (MAE). Methodologies are explained in all sections, along with respective figures.