“Storytelling is the most powerful way to put ideas into the world.” -Robert Mckee

In this article, using small stories, I will try to explain the concepts of ensemble machine learning.

In recent times, I haven’t found any Kaggle competition-winning solution which doesn’t have ensemble machine learning. So, it might be a good way to understand the basic concepts of ensemble machine learning using some examples.

## Ensemble machine learning

Suppose you want to buy a house. To understand if this is the perfect house for you or not, you will ask questions to your friends who have bought a house, real-estate brokers, neighbors, colleagues, and your parents. You will give weights to each of the answers and try to arrive at the final answer to your question. Exactly, this is ensemble learning.

*Ensemble machine learning is an art to create a model by merging different categories of learners together, to obtain better prediction and stability.*

*Naive Ensemble* machine learning techniques are:

**Max voting**— Based on the previous example, if you have asked 10 people about the house and 7 people told not to buy the house. Your answer is not to buy the house based on max voting.**Averaging**— If each of these people gives the probability that you should buy this house or not (Like your parents say that this house will be 70% suitable for you), you take an average of all these probabilities and take the decision to buy the house.**Weighted averaging**— Suppose, you have a trust issue and you trust more your parents and close friends than any other. You give some higher weights(suppose 60%) to the probabilities given by these people and lower weights(40%) to others. Then you will take the weighted average and take the final probability.

*Advance Ensemble* machine learning techniques are:

**Bagging**— Also known as Bootstrap Aggregating.

I have a set of multi-colored balls in my bag. I asked a kid to pick 5 balls. Then again, I put the balls back and asked the kid to pick 5 balls again and again. This repetitive task is known as Bootstrapping or sampling with replacement.

“

Bootstrappingis any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods.” -Wikipedia

Now, based on every 5 balls drawn, I will find the probability of a white ball. Suppose, I get 2 white balls out of 5, then I have a probability of 2/5 i.e. 40% and if I get 0 white balls out of 5, then I have a probability of 0/5 i.e. 0%.

In the end, I will take the average probability of all the time the ball is drawn and conclude what is the probability of getting a white ball getting drawn from the bag?

So basically, I am creating a small model out of each sample of balls withdrawn and then the balls are put back. In the end, I combined the predictions of each of the models to obtain the final solution — probability. This is bagging.

*ML version of bagging*:

- From the original dataset, randomly multiple samples are generated with replacement
- A weak learner (a base model like a decision tree) is created on each of these subsets, such that all these weak learners are independent of each other and run in parallel
- Finally, combine the predictions obtained from each of these weak learners to create a prediction for the strong learner(final bagging model)
- One of the most popular examples of bagging is Random Forest

*Advantage*:

a. Less Overfitting — Many weak learners aggregated typically outperform a single learner over the entire set, and has less overfit

b. Stable — Removes variance in high-variance low-bias data sets

c. Faster — Can be performed in parallel, as each separate bootstrap can be processed on its own before combination

*Disadvantage*:

a. Expensive — computationally it will be expensive if the data set is quite big

b. Bias — In a data set with high bias, bagging will also carry high bias into its aggregate

c. Complex — Loss of interpretability of a model.

**2. Boosting**

One day, I thought why not cook food on your own. So, I cooked food. But I found I have added extra salt. So, it was not tasty. The next time when I cooked, I put lesser salt. But I found it was too spicy. So, the next time when I cooked I put in adequate spice and salt, but the food got burnt. So, the next time when I cooked, I put adequate spice and salt with food getting cooked at low flame and I was watchful. Finally, I cooked tasty food. At the inception, I was a weak learner. But I keep on learning from my own mistakes and in the end, I became a strong learner.

*ML version*

In boosting, the weak learner (decision tree) with a relatively high bias are built sequentially such that each subsequent weak learner aims to reduce the errors (mistakes) of the previous learner. Each learner learns from its predecessors and updates the residual errors. Hence, the learner that grows next in the sequence will learn from an updated version of the residuals. Each of these weak learners contributes some vital information for prediction, enabling the boosting technique to produce a strong learner by effectively combining these weak learners. The final strong learner brings down both the bias and the variance.

There are two types of boosting:

**A. Weight based Boosting**

Steps:

i. A sample dataset is taken to train the model with a weak learner. For example, I have three independent variables X1, X2, and X3, and the dependent variable Y which I have to predict.