Recursive Least Squares for Linear Contextual Bandits

Conclusion

I presented here the application of the recursive least squares to linear contextual bandits. In my opinion, this is a simple to implement and scalable algorithm. Its main advantage is the direct use of the inverse of sum of a square matrix, which makes it computationally efficient. It also enables us to explore interesting future use cases:

We can use this model for multi-objective optimisation which will have a non-Bernoulli reward (unit interval or monetary values).
We will be able to solve very complex tasks with a lot of arms and interactions while reducing the complexity by updating only relevant regions of the feature space when we receive feedback. In the same spirit as [4], we can have an ever-changing model in which we can add and remove arms.

I hope this blog was a useful entry and can be applied as a cookbook recipe. Stay tuned for future extensions, simulations and use cases!

[1] Agrawal, S. and Goyal, N., 2013, February. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning (pp. 127–135).

[2] Graepel, T., Candela, J.Q., Borchert, T. and Herbrich, R., 2010. Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. Omnipress.

[3] Chapelle, O., Manavoglu, E. and Rosales, R., 2014. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4), pp.1–34.

[4] Li, L., Chu, W., Langford, J. and Schapire, R.E., 2010, April. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (pp. 661–670).

[5] Yin, Harold J. Kushner, G. George (2003). Stochastic approximation and recursive algorithms and applications (Second ed.). New York: Springer. pp. 8–12.

[6] Wikipedia: Online Machine Learning, section: Recursive Least Square. (I used the notation from this article.)

[7] Zhou, L., 2015. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326.

[8] Wikipedia: Welford’s Online Algorithm

Image source

https://pixabay.com/photos/wave-signal-communication-850007/

Learn more about technology at Expedia Group

Conclusion

Image source

Footer