Many posts and approaches about algorithmic investing and algorithmic trading talk about price forecasting. These approaches rely, in general, in regression techniques to try and predict what will be the future price of a given security. This is known to be a hard task, and that, most of the time, will be biased by momentum. That is, the Machine Learning (ML) model simply learns to mimic the last test of the security.

What many of these approaches ignore, however, is that we can produce a much simpler question that can help us to perform algorithmic investing without the worries of ML overfitting. We can ask a “yes or no” question to our algorithms. That is, instead of asking “How much will the price go up?”, we can ask “Should I buy this security?”. The later is a much simpler question since it reduces the space of answers to yes or no. The first question opens space for a maybe!

For example, suppose that a stock A has a current price of $10, this stock has been steadily increasing its price from $5 during the last 6 months. By asking how much will the price increase 6 months from now, a regression algorithm might say $10, $5 or any other values in between or outside the given range. These algorithms also might have some uncertainty, which might confuse a beginner investor. For example, it might say that the price might increase between $5 and $20 with a error of 10%. That is not easy to interpret and, most of the time, the investors will catch themselves thinking that they “maybe should invest on this”.

By reducing the question of investing to “should I invest on this security or not” we are making the investor life easier. Moreover, we are applying a systematic investment strategy, which is desirable since it is easily auditable and interpretable. But how do we do that?

So far we have talked about Machine Learning and Regression techniques. This technique comes from a field named Supervised Learning, having the main characteristic of looking into annotated data and learning from it. In this case, datasets are labeled with a value and the algorithm learns to perform a regression. In a classification task, instead of numbers we can attribute one or more classes to the examples provided to the algorithm. In this post we will only discuss binary classification, that is, there are two distinct classes to our examples.

Translating the above in plain terms, our goal is to input a security name and the classification algorithm should answer the question “should I invest on this security” with a “YES” or “NO”. In other terms, the classification algorithm will say “YES” to any securities that it thinks that the price will go up, it will say “NO” otherwise.

As with any techniques, we should be aware of the benefits and potential drawbacks that might arise. By using classification we are simplifying a complex question and applying a systematic methodology to the matter. This approach makes our investment principled and auditable. It will also make it reproducible and it will be easier to find mistakes or create tweaks.

On the other hand, simplifying the question to a “yes or no” hides the values of change, that is by how much will the price change. So, investors might catch themselves investing in securities with small gains. That concern can be addressed by adding statistical analysis of the security, informing expected gains, losses and of course backtesting, i.e. testing the investment strategy using historical data.

Investing, no matter if actively or passively, is a thought activity. There are many questions asked and many of them have no easy answer. In this post I developed the idea of asking simpler questions in the investment domain. I have introduced the necessary concepts for you to start thinking about new possibilities in the investment realm. In a future post, I will demonstrate programmatically how to develop a simple decision maker based on the binary classification of securities.