In former articles, we have defined what generalization error is in the context of machine learning, and how to bound it through various inequalities. We also defined overfitting and how it can be remedied by using a validation set. We could avoid the paradox of choosing a size for our validation set by using cross-validation, which turned out to be an unbiased estimator for E_out(N — 1). In this article, we will give some practical examples on which inequalities to use in the case of a validation set and cross-validation.

**Example — Validation Set**

Imagine that we have a dataset, D, with a sample size N = 100. We split our dataset into two parts; a training set with size 75 and a validation set with size 25. We want to evaluate 100 models, which means we have 100 hypothesis sets and find the model with the best performance on our validation set. We will say that all models have d_VC = 10. Then, we want to give an upper bound of our out-of-sample error. Let us first illustrate the process we will go through:

As stated, we have 100 models to evaluate, which gives 100 hypothesis sets. We do not know the dimensions of these hypothesis sets — meaning they can have infinite dimensions. Each hypothesis set is trained on the training data and produces a final hypothesis, g_n-. This indirectly creates a new hypothesis set for our validation data: