Use Case #1: A Glimpse Into The World of Retail

Plan of action:

Evaluation metrics
Baselines
Models
Performance
Example

Before we jump in, let’s first establish a few ground rules!

★ Prerequisites:

An item is the combination of a category and a subcategory.
Returns are removed from the transactions and only the purchase record is kept. A customer may return a product that doesn’t suit his needs, but he is probably still interested in the category/subcategory and thus, the item. Sometimes, especially in e-commerce, clients order more sizes and keep only one.
All products can be recommended: New items are recommended, but also those that have already been purchased, as a customer may want to buy an item more than once.

If you’re a little lost, then you should have read part 1🤓.

★ What we evaluate on:

Monthly 2013 recommendations: Most customers only make one transaction a month. For each month, we use the sales history up to month M-1 to generate recommendations for month M.

We discard 2014 data as it is not enough.

Timeline for train periods and test periods

Train and test periods’ table

Only Customers who actually bought something during the month we’re evaluating: Since we generate recommendations per month, we will only be able to evaluate our predictions based on customers who actually made a transaction during the month in question.

We need a winner and a loser amongst our models. To determine our champion, we base their performance on a few metrics.

Say hello to precision and recall:

Precision formula

Recall formula

However, those two are not enough, as they don’t take into account the recommendation list ordering. Instead, we call in the big guns, figuratively speaking of course, as we are anti-gun violence.

If relevant items are at the top of the ranked list of predictions, it will positively impact the following metrics¹:

Precision@K: how many relevant items have been predicted with the top K predictions.
Recall@K: how well the recommender retrieved all relevant items among the top K predictions.
MAP@K — Mean Average Precision@K: computes the Average Precision (AP) over all the customers at rank K. The AP rewards having a lot of relevant recommendations and having them at the top of the list.
Novelty@K²: provides insight into newly recommended items. If the score is high, it means that we recommend relevant new items to customers.

Sadly, we won’t be handing out any participation trophies. And of course for metrics, as for all things in life, bigger is better across the board 👀!

We use two recommenders as baselines:

The last sold items recommender: recommends the last items a customer bought. In case the customer did not buy anything before, we recommend the last items bought by the other customers.
The most sold items recommender: recommends the most bought items by a customer. In case the customer did not buy anything before, we recommend the most bought items by all the other customers.

All code can be found here.

a. Content-based recommender

By definition, content-based recommender systems use item features to recommend new items to a user. We created our items’ features by encoding the columns “category” and “subcategory” using OneHotEncoder.
Basically, a customer who buys women’s clothes and women’s bags is more likely to be interested in women’s items. We will not include customer related features (age, sex…), nor context related features (season, year…).

Content based recommends similar items

We talked a lot about categories, subcategories and items. Let’s take a closer look 🔍:

6 Categories: Clothings, Footwear, Electronics, Bags, Books and Home and Kitchen
18 Subcategories: Women, Men, Kids, Children, Non-Fiction, Academic, Fiction, Audio and Video, Furnishing, Comics, Mobiles, Computers, Personal Appliances, Camera, DIY, Kitchen, Bath and Tools

Based on our item definition and our dataset, we end up with 23 items(all categories do not necessarily pair with all subcategories).

Step 1: Build two matrices to input into the model

We build a first matrix items_features that describes items in terms of features. From transactions, we build a second matrix users_items_sales describing the history of interactions (purchases) between customers and items:

items_features matrix: Describes each item features.
users_items_sales matrix: Describes how many items each customer bought.

items_features matrix

user_items_sales matrix

Step 2: Compute a user preference matrix

The users_features_preference matrix gives the importance of each feature for all customers that have actually bought items. It is calculated by the following formula:

users_features_preference formula

The following code was used to compute said matrix:

Fit method for Content-Based Recommender, learns the users preferences for features

users_features_preference

Step 3: Compute the score of relevance per item per customer

To get the score of relevance per item per customer, we multiply the users_features_preference matrix by items_features.

Predict method for Content-Based recommender , learns the items’ relevance per customer

Relevance matrix

The items with the highest scores are recommended for the customer 🚀.

b. Collaborative filtering recommender

By definition, collaborative filtering recommender systems use solely interactions between customers to recommend their next purchase. Basically a customer that buys several items that another customer has purchased, should be interested in what the latter has already bought, but that he had not yet himself.

Confused? That’s okay, we’re nice enough to break it down for you:

Arrows show real purchases and dotted arrow shows the recommendation based on collaborative filtering

If you still don’t understand, then we can’t help you 🤷‍♀️!

How does it work?

or:

Using Matrix factorisation method through Alternating Least Squares!

Similarly to the content based-recommender, we will try to compute a relevance matrix using an items_features matrix and users_features_preference. The difference, this time, is that we will let the matrix factorisation learn them.

Let’s look at it step by step:

Step 1: Build two matrices to input into the model

Similarly to content-based recommender, we build the same users_items_sales matrix describing the history of interactions (purchases) between customers and items:

user_items_sales matrix

Step 2: Jointly compute an items’ features and a user preference matrix

Matrix factorisation will take it from here. It needs a number of latent features N (we went with lucky number 7 after trial and error) to build our two matrices:

items_features: of shape: (# items, # latent features N), describes items in terms of features
users_features_preference: of shape: (# users, # latent features N), describes the importance of each feature to users

The matrix factorisation builds the aforementioned matrices based on the rule:

matrix factorisation formula

We used the Implicit python package to implement it. The following are the output matrices:

items_features matrix

users _features_preference matrix

Step 3: Compute the score of relevance per item per customer

Once again, the score of relevance per item per customer is computed by multiplying the users_features_preference matrix by items_features.

And voilà!

Relevance matrix

Based on the highest score, items are recommended for each customer.

We call upon the metrics mentioned above.

a. Precision@K:

Precision at rank K

At rank K=1, the difference between the models is most apparent. Starting from rank K=2 performances start to converge.

Collaborative filtering takes the cake. It makes the most precise recommendations, up until rank 10.

As expected the higher the rank K, the lower the performance is for all the models.

b. Recall@K:

Plan of action:

a. Content-based recommender

b. Collaborative filtering recommender

Footer