The new framework enables the creation of simulation environments to study reinforcement learning algorithms in recommender systems.
I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
Recommendation systems are all around us and they are getting more sophisticated by the minute. While traditional recommender systems were focused on one-time recommendations based on user actions, new models effectively engage in sequential interactions to try to find the best recommendation based on the user behavior and preferences. This type of recommendation systems are known as collaborative interactive recommenders(CIRs) and have been triggered by advancements in areas such as natural language processing(NLP) and deep learning in general. However, building this systems remains a challenge. Recently, Google open sourced RecSim, a platform for creating simulation environment for CIRs.
Despite the popularity and obvious value proposition of CIRs, its implementation have remained limited. This is part due to difficulty of simulating different user interaction scenarios. Traditional supervised learning approaches result very limited when comes to CIRs given that is hard to find datasets that accurately reflect user interaction dynamics. Reinforcement learning has evolved as the de facto standard for implementing CIR systems given the dynamic and sequential nature of the learning process. Just like CIR systems are based on a sequence of user actions, reinforcement learning agents learn by taking actions and experiencing rewards across a sequence of situations in a given environment. While reinforcement learning systems are conceptually ideal for the implementation of CIRs, there are very notable implementation challenges.
· Generalization Across Users: Most RL research focuses on models and algorithms involving a single environment. The ability to generalize knowledge across different is essential for an effective CIR agent.
· Combinatorial Action Spaces: Most CIR systems require to explore combinatorial variations of recommendations and user actions which are hard to capture in simulation models.
· Large, Stochastic Action Space: Many CIR environments deal with a set of recommendable items that is dynamically and stochastically generated. Think about video recommendation engine may operate over a pool of videos that are undergoing constant flux by the minute. Reinforcement learning systems are typically challenged in those non-fixed environments.
· Long Horizons: Many CIR systems need to operate over long horizons in order to experience any significant change in user’s preferences. This is another challenging aspect for simulation models.
Most of these challenges boiled it is very difficult to effectively simulate combinations of user actions in a way that can be quantified and used to improve the agent’s learning policy.
RecSim is a configurable platform for authoring simulation environments to allow both researchers and practitioners to challenge and extend existing RL methods in synthetic recommender settings. Instead of trying to create a generic, perfect simulator, RecSim focuses on simulations that mirror specific aspects of user behavior found in real systems to serve as a controlled environment for developing, evaluating and comparing recommender models and algorithms.
Conceptually, RecSim simulates a recommender agent’s interaction with an environment consisting of a user model, a document model and a user choice model. The agent interacts with the environment by recommending sets or lists of documents (known as slates) to users, and has access to observable features of simulated individual users and documents to make recommendations.
Diving into more details, the RecSim environment consists of a user model, a document model and a user-choice model. The recommender agent interacts with the environment by recommending slates of documents to a user. The agent uses observable user and a candidate document features to make its recommendations.
The document model also samples items from a prior over document features, including latent features such as document quality; and observable features such as topic, or global statistics like ratings or popularity. Agents and users can be configured to observe different document features, so developers have the flexibility to capture different RS operating regimes. The user model samples users from a prior over configurable user features, including latent features such as personality, satisfaction, interests; observable features such as demographics; and behavioral features such as session length, visit frequency and budget.
When the agent recommends a document to a user, the response is determined by a user-choice model, which can access observable document features and all user features. Other aspects of a user’s response can depend on latent document features, such as document topic or quality. Once a document is consumed, the user state undergoes a transition through a configurable user transition model, since user satisfaction or interests might change.
Another important component of the RecSim architecture is the similar who is responsible for controlling the interactions between the agents and the environment. The interactions are based on six fundamental steps.
1. The simulator requests the user state from the user model, both the observable and latent user features.
2. The simulator sends the candidate documents and the observable portion of the user state to the agent.
3. The agent uses its current policy to returns a slate to the simulator to be “presented” to the user.
4. The simulator forwards the recommended slate of documents and the full user state (observable and latent) to the user choice model.
5. Using the specified choice and response functions, the user choice model generates a (possibly stochastic) user choice/response to the recommended slate, which is returned to the simulator.
6. The simulator then sends the user choice and response to both: the user model so it can update the user state using the transition model; and the agent so it can update its policy given the user response to the recommended slate.
RecSim provides a very unique approach to streamline the testing and validation of CIR systems based on deep learning. The code has been open sourced on GitHub and the release was accompanied by this research paper. Certainly, it’s going to be interesting to see the types of simulations researchers and data scientists build on top of RecSim.