This is a series of posts that explain Ensemble method in machine learning in an easy-to-follow manner. In this post, we will discuss about Heterogenous Ensemble.
2/ Homogenous Ensemble — Bagging
3/ Homogenous Ensemble — Boosting
As a machine learning enthusiast and a self-taught learner, I write the following post as a study material for myself and would love to share it with my fellow learners .The idea is to show the whole picture of different techniques of Ensemble, when to use and what effects on our model they’ve got to offer.
Ensemble is a machine learning technique that makes prediction by using a collection of models. This is the reason why most of the time, learners tend to discover this method after studying all other classic algorithms. The strength of this technique derives from the idea “The wisdom of the crowds” that states:
“the aggregation of information in groups, resulting in decisions, are often better than could have made by any single member of the groups” (Wikipedia, Wisdom of the crowd, 2020).
The idea is certainly proven by the fact that there are several winning predictive models in Kaggle competition deploying this modelling method.
There are great benefits coming from implementing Ensemble. Apart from amazing high accuracy, it is ideal to perform Ensemble method on larger datasets to reduce training time and memory space by optimising parallel computing(e.g. XBBoost, Light GBM). One of the worst nightmares of machine learning practitioners is overfitting especially for real-world datasets that contain noise and not follow any typical data distributions. In that case we can also use Ensemble to fight against high variance (e.g. Random Forest).
Heterogenous Ensemble is used when we want to combine different fine-tuned algorithms to come up with the best possible prediction. Voting and Stacking are examples of this technique.
This method is used for classification tasks. It combines predictions of multiple estimators using Mode (choosing the majority class) to select the final result. By choosing the class that has the largest sum of votes, Hard Voting Ensemble can achieve better prediction than models that just use a single algorithm.
In order to be effective, we need to provide a set of odd (more than 3) and diverse estimators to produce their own independent predictions.
Following is an example of code using Hard Voting Ensemble. Please note that the list of algorithm is your choice as long as they are fine-tuned, diverse and solve classification tasks.
Both regression and classification tasks can be used for this technique. In order to perform Soft Voting, we employ Mean — averaging predicted probability for Classification and predicted value for Regression tasks. The characteristics are similar with Hard Voting except that we can use any number of estimators as long as there are more than 2. We can also assign weights to classifiers depend on their importance for final prediction like Hard Voting.
After having tried Hard Voting and Soft Voting but still not achieve expected results, this is when Stacking comes into play. The technique is employed for both Classification and Regression tasks. The biggest difference between Stacking and the two other Heterogenous Ensemble methods is that apart from base learners, we have an extra meta-classifier.
Firstly, the base learners will train and predict on the dataset. Then, meta-estimator on the second layer will use prediction of the base learners layer as new input features for the next step. Note that the meta-estimator will train and predict on dataset that takes new input features (X), the number of new features is equal to the number of your base learners, and the class labels (Y) of the original dataset. In this context, we no longer use estimates of location for combination but use a trainable learner as a combiner itself.
The benefit of this approach is that the meta-estimator can effectively examine which base estimator provides better prediction as well as participate directly in the final prediction using the original data and new input features.
Overall, Heterogenous Ensemble method is a great choice when you already build a small set of estimators, individually train and fine-tune them. Bear in mind that
“ Voting only makes sense if the learning schemes perform comparably well. If two of the three classifiers make predictions that are grossly incorrect, we will be in trouble!” (Witten, Frank, Hall, & Pal, 2016).
Heterogenous Ensemble certainly provides an improvement on overall performance by applying a simple and intuitive Ensemble technique. In the next post, we will dive in another Ensemble technique that consists of some award-winning algorithms.
Thanks for reading!
Don’t be shy, let’s connect if you are interested in more of my posts in: