I presented here the application of the recursive least squares to linear contextual bandits. In my opinion, this is a simple to implement and scalable algorithm. Its main advantage is the direct use of the inverse of sum of a square matrix, which makes it computationally efficient. It also enables us to explore interesting future use cases:
- We can use this model for multi-objective optimisation which will have a non-Bernoulli reward (unit interval or monetary values).
- We will be able to solve very complex tasks with a lot of arms and interactions while reducing the complexity by updating only relevant regions of the feature space when we receive feedback. In the same spirit as , we can have an ever-changing model in which we can add and remove arms.
I hope this blog was a useful entry and can be applied as a cookbook recipe. Stay tuned for future extensions, simulations and use cases!
 Agrawal, S. and Goyal, N., 2013, February. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning (pp. 127–135).
 Graepel, T., Candela, J.Q., Borchert, T. and Herbrich, R., 2010. Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. Omnipress.
 Chapelle, O., Manavoglu, E. and Rosales, R., 2014. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4), pp.1–34.
 Li, L., Chu, W., Langford, J. and Schapire, R.E., 2010, April. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (pp. 661–670).
 Yin, Harold J. Kushner, G. George (2003). Stochastic approximation and recursive algorithms and applications (Second ed.). New York: Springer. pp. 8–12.
 Wikipedia: Online Machine Learning, section: Recursive Least Square. (I used the notation from this article.)
 Zhou, L., 2015. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326.
 Wikipedia: Welford’s Online Algorithm