Ensembles

3 min readMar 29, 2022

You have a doubt in Quantum Physics. You ask your physics teacher, he’s able to explain the theory and base math. But he’s not able to explain you the Math in deatil, so you ask you mat teacher to explain you the math. And voila ! you understood the problem.

That’s exactly what ensemble does. If you would have asked your math teacher first, he wouldn’t be able to explain the theory but he can explain you the math, your physcis teacher can explain the theory not the math, so you asked both of them and combined the knowledge to find the solution to your problem. That’s what ensembles do. When one single model can’t understand your data well. We use ensemble, multiple models.

Ensembles combine several base models to create a more better model to get better performance.

We are going to discuss 4 types of ensembles.

Note- This article is more like an introductory one and the deeper math are not discussed in this.

The 4 types are

Bagging, Boosting, Stacking, Cascading

Let’s discuss each seperately.

Bagging

Bagging is also called as Bootstrap Aggregation. Have you ever heard of Bootstrap Sampling ?

It’s simple, drawing samples from a population with replacement.

And Aggregation could be median, mean, average. We commonly use median.

Bagging bascially focuses on reducing variance. So, we choose the base models, such that they have high variance and low bias, so that we can use bagging to reduce the variance of the base models to acheive better models.

First you create multiple datasets by sampling with replacement. And then use these datasets and train individual base models on each of them. Then aggregate these models to get the final output.

And remember these base models should be high variance and low bias models.

Boosting

The simple intuition behind boosting is that, we use the output from the previous model as the input of the successive model.

That doesn’t mean you take the output of the previous model and feed into the next model. We use the previous model’s output to calculate, error. To make it simple, you can consider MSE as the error. And we then feed the training parameters and MSE error for the next model to train on.

Boosting mainly focuses on reducing Bias, so we choose the base learners such that they have high bias and low variance.

One more important thing to notice is that in case of Bagging we can train the models parallely, and then aggregate them to get the final output. But we can’t do the same in Boosting, as the models are trained sequentially.

Again as I said before, I’m going into the in depth math. Boosting is a wide topic and I’m not discussing the math behind it.

Stacking

Stacking is similar to Bagging except that it uses different base models. Bagging and Boosting are homogenous, that is the same base model is used with different hyperparameters. While Stacking is heterogenous, that is, it can train on different base models, like KNN, SVM.

Most of the best kaggle best solutions use Stacking, but stacking isn’t prefered in real- world as it could be computationally expensive.

Just like Bagging, In Stacking the base learners can be trained paralley.

Cascading

Casding is again a sequential traning method. After training on an individual base model, we compute the probability of correctness, if it’s lesser than the threshold fixed by us, we pass the data to the further model.

The important thing to notice is that, we only pass those points, that had probability lesser than the threshold, not the whole dataset.

These are main

Ensembles

Written by Raghavi_bala