What is Improper Learning? In machine learning parlance, given a class of controllers (or hypotheses or classifiers as the case may be), an algorithm that picks strictly from among those available is called a Proper Learner, and those that output from (potentially) outside the given class are said to be Improper Learners [2]. A simple example of this could be a classification problem using a finite set of N linear predictors, i.e., N weight vectors. In this scenario, a learning algorithm which (post training) always picks the best from among the N given predictors would be called a proper learner. Alternatively, the algorithm could output the best from the convex hull of this set, and would then be called an improper learner. Over the years, while improper learning (IL) has received some attention within the statistical and online learning communities, it remains largely unexplored in control. One of the objectives of this article, therefore, is to bring to the attention of researchers and practitioners this technique as a promising and timely direction of investigation.
Statistical Learning. Improper Learning has already made inroads in statistical learning and the results have been quite clearly encouraging. One obvious example is the technique of Boosting, explored thoroughly in the context of classification in [5]. The celebrated AdaBoost algorithm has now been adapted to every conceivable application area in machine learning from classification to algorithmic trading, even winning its inventors the Godel Prize in 2003. More recently, the problem of model misspecification, for example, was investigated in the context of supervised learning in [3], which also showed a dramatic improvement in regret performance with improper learners even when using an incorrect parametric model (proper learners were shown to perform much worse). Similarly, the problem of finite-time regret with logistic regression, was recently investigated in [4] wherein, once again, regret performance was significantly improved by leveraging improper learning. Note that in both these cases, the learner only had to expand its search to include convex combinations of available, atomic predictors.
Improper Learning in Control. Improper Learning is beginning to receive attention within the control community and already, two distinct approaches can be observed. The first [6] follows the paradigm described above, using a base or atomic class of (non-adaptive) control policies along with an adaptive meta-learner that combines the outputs of these policies to produce an improper controller with performance strictly better than those in the base class. Indeed, [6] also shows examples where a stabilizing controller emerges from a set of unstable atomic controllers. Importantly, the improper controller interacts with the controlled system only through the base controllers, i.e., it picks a base controller in each round, which in turn implements its control action on the system. The control policy that emerges from the adaptive algorithm need not match exactly with any of the base policies for every system state and hence, is clearly improper.
On the flip side, the second approach [1] essentially extends the idea of boosting to control. This involves maintaining a set of instances of a “weak learning” algorithm (such as Online Gradient Descent). The weak learners are assumed to be adaptive and offer control suggestions to anon-adaptive booster. The booster, in turn, combines these suggestions into a control action which it implements upon the controlled system. The same logic as before shows why the booster can be considered an improper learner. Note however, that the architecture here is essentially the reverse of the one in [6] — the controller is non-adaptive and directly interacts with the system.
Moving Forward. While these preliminary attempts look promising, there is much room for improvement. For instance are the two architectures described above the only two possible? Is there a principled way to choose the base class for a given application? On the side of theory, questions about regret bounds and convergence rates abound. How does one extend this theory to situations comprising multiple learning agents? Plenty of very fundamental questions both theoretical and practical thus remain open and provide exciting opportunities for researchers to move the needle forward in this emerging field of control.
Keywords: Improper Learning, Reinforcement Learning, Boosting, MetaRL, AdaBoost.