Realizability and Sample Complexity in Machine Learning

**Machine Learning from First Principles: Blog Post 4**

In the last blog we looked at the problem of Overfitting which arises when we have a hypothesis exactly learns the training samples as they are and fails to generalize on unseen samples. We also saw that in order to solve this issue, we can introduce a large class of Hypothesis, H in which the best hypothesis resides. This is called Inductive Bias in ML.

The way to think about this is, when you only had a single hypothesis that perfectly fit the data, what happened was that there were no other options to choose from. As the model trained itself, it made itself twist and turn such that it perfectly fit the data in order to converge. With Inductive Bias, we introduce other possibilities to choose from and now the data has more options to fit to. By introducing other options, you don’t force your model to twist and turn the fitting curve whichever way it wants to suit the training data and hence you avoid overfitting. Now with the assumptions made, the fitting curve cannot be completely arbitrary in shape and hence it gives a better generalization than before. The best hypothesis is in the hypothesis class H and can be found by finding the minimum empirical loss w.r.t every hypothesis in H.

H is pre-specified class of hypothesis.

In order to move forward, we will be making some assumptions which would later be relaxed.