
Plan of action:
- Evaluation metrics
- Baselines
- Models
- Performance
- Example
Before we jump in, let’s first establish a few ground rules!
★ Prerequisites:
- An item is the combination of a category and a subcategory.
- Returns are removed from the transactions and only the purchase record is kept. A customer may return a product that doesn’t suit his needs, but he is probably still interested in the category/subcategory and thus, the item. Sometimes, especially in e-commerce, clients order more sizes and keep only one.
- All products can be recommended: New items are recommended, but also those that have already been purchased, as a customer may want to buy an item more than once.
If you’re a little lost, then you should have read part 1🤓.
★ What we evaluate on:
- Monthly 2013 recommendations: Most customers only make one transaction a month. For each month, we use the sales history up to month M-1 to generate recommendations for month M.
We discard 2014 data as it is not enough.
- Only Customers who actually bought something during the month we’re evaluating: Since we generate recommendations per month, we will only be able to evaluate our predictions based on customers who actually made a transaction during the month in question.
We need a winner and a loser amongst our models. To determine our champion, we base their performance on a few metrics.
Say hello to precision and recall:
However, those two are not enough, as they don’t take into account the recommendation list ordering. Instead, we call in the big guns, figuratively speaking of course, as we are anti-gun violence.
If relevant items are at the top of the ranked list of predictions, it will positively impact the following metrics¹:
- Precision@K: how many relevant items have been predicted with the top K predictions.
- Recall@K: how well the recommender retrieved all relevant items among the top K predictions.
- MAP@K — Mean Average Precision@K: computes the Average Precision (AP) over all the customers at rank K. The AP rewards having a lot of relevant recommendations and having them at the top of the list.
- Novelty@K²: provides insight into newly recommended items. If the score is high, it means that we recommend relevant new items to customers.
Sadly, we won’t be handing out any participation trophies. And of course for metrics, as for all things in life, bigger is better across the board 👀!
We use two recommenders as baselines:
- The last sold items recommender: recommends the last items a customer bought. In case the customer did not buy anything before, we recommend the last items bought by the other customers.
- The most sold items recommender: recommends the most bought items by a customer. In case the customer did not buy anything before, we recommend the most bought items by all the other customers.
All code can be found here.
a. Content-based recommender
By definition, content-based recommender systems use item features to recommend new items to a user. We created our items’ features by encoding the columns “category” and “subcategory” using OneHotEncoder.
Basically, a customer who buys women’s clothes and women’s bags is more likely to be interested in women’s items. We will not include customer related features (age, sex…), nor context related features (season, year…).
We talked a lot about categories, subcategories and items. Let’s take a closer look 🔍:
- 6 Categories: Clothings, Footwear, Electronics, Bags, Books and Home and Kitchen
- 18 Subcategories: Women, Men, Kids, Children, Non-Fiction, Academic, Fiction, Audio and Video, Furnishing, Comics, Mobiles, Computers, Personal Appliances, Camera, DIY, Kitchen, Bath and Tools
Based on our item definition and our dataset, we end up with 23 items(all categories do not necessarily pair with all subcategories).
Step 1: Build two matrices to input into the model
We build a first matrix items_features that describes items in terms of features. From transactions, we build a second matrix users_items_sales describing the history of interactions (purchases) between customers and items:
- items_features matrix: Describes each item features.
- users_items_sales matrix: Describes how many items each customer bought.
Step 2: Compute a user preference matrix
The users_features_preference matrix gives the importance of each feature for all customers that have actually bought items. It is calculated by the following formula:
The following code was used to compute said matrix:
Step 3: Compute the score of relevance per item per customer
To get the score of relevance per item per customer, we multiply the users_features_preference matrix by items_features.
The items with the highest scores are recommended for the customer 🚀.
b. Collaborative filtering recommender
By definition, collaborative filtering recommender systems use solely interactions between customers to recommend their next purchase. Basically a customer that buys several items that another customer has purchased, should be interested in what the latter has already bought, but that he had not yet himself.
Confused? That’s okay, we’re nice enough to break it down for you:
If you still don’t understand, then we can’t help you 🤷♀️!
How does it work?
or:
Using Matrix factorisation method through Alternating Least Squares!
Similarly to the content based-recommender, we will try to compute a relevance matrix using an items_features matrix and users_features_preference. The difference, this time, is that we will let the matrix factorisation learn them.
Let’s look at it step by step:
Step 1: Build two matrices to input into the model
Similarly to content-based recommender, we build the same users_items_sales matrix describing the history of interactions (purchases) between customers and items:
Step 2: Jointly compute an items’ features and a user preference matrix
Matrix factorisation will take it from here. It needs a number of latent features N (we went with lucky number 7 after trial and error) to build our two matrices:
- items_features: of shape: (# items, # latent features N), describes items in terms of features
- users_features_preference: of shape: (# users, # latent features N), describes the importance of each feature to users
The matrix factorisation builds the aforementioned matrices based on the rule:
We used the Implicit python package to implement it. The following are the output matrices:
Step 3: Compute the score of relevance per item per customer
Once again, the score of relevance per item per customer is computed by multiplying the users_features_preference matrix by items_features.
And voilà!
Based on the highest score, items are recommended for each customer.
We call upon the metrics mentioned above.
a. Precision@K:
At rank K=1, the difference between the models is most apparent. Starting from rank K=2 performances start to converge.
Collaborative filtering takes the cake. It makes the most precise recommendations, up until rank 10.
As expected the higher the rank K, the lower the performance is for all the models.
b. Recall@K: