Ensemble Methods in Multiple Classifier System- Bagging, Boosting and Stacking

Kelly Szutu
Artificial Intelligence in Plain English
5 min readApr 29, 2020

--

Photo by Jordan McDonald on Unsplash

What is an ensemble method? Construct a set of independent models and predict class labels by combining their predictions made by multiple models. This strategic combination can reduce the total error, including decrease variance (bagging) and bias (boosting), or improve the performance of a single model (stacking).

Here, I use the “Red Wine Quality” data from Kaggle to demonstrate ensemble methods. “Quality” is our target variable. The only preprocessing I do is turning the 10 point scale to 3 classification levels, “1”, “2” and “3” represent “good”, “medium” and “bad” respectively.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
#import data
wine = pd.read_csv('winequality.csv')
#preprocess data
def getquality(x):
if x > 6.5:
return 1
elif x < 4.5:
return 3
else:
return 2
wine['quality'] = wine['quality'].apply(getquality)
#seperate features and target variable
x = wine.drop(['quality'], axis=1)
y = wine['quality']
#split into train and test data
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=1)
#copy code to deal with SettingWithCopyWarn
xtrain = xtrain.copy()
xtest = xtest.copy()
ytrain = ytrain.copy()
ytest = ytest.copy()

Model 1:

The accuracy score for the default DecisionTreeClassifier() is 0.815625.

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
model_pred = model.predict(xtest)
#see prediction result
print('Accuracy Score: {0:6f}'.format(model.score(xtest, ytest)))

Bagging

As known as bootstrapping aggregation, the idea is to train lots of base models on randomly different versions of the training data. Each model owns one vote and is treated the same no matter what the prediction accuracy is, then the predictors are aggregated to get the final result. In most cases, the variance of the result becomes smaller after bagging. For example, Random forest is the most known model among the bagging methods, it combines decision trees with bagging idea.

We usually extract training subsets from the original sample set using the bootstrapping method (sampling with replacement) in each round. Therefore, in the training set, some samples may be drawn multiple times, while some may never be drawn. This makes the training subsets be independent.

There’s no limitation for model selection, we adopt different classifiers or regressors based on the problem we are facing. Each prediction model can be generated parallelly from the training sets. They are equally important and share the same proportion of weight. After combining the outputs, we use the majority voting for the classification problem and averaging for the regression problem.

Model 2:

Instead of building a random forest classifier, I use decision tree with BaggingClassifier() and got a 0.856250 accuracy score.

from sklearn.ensemble import BaggingClassifier
model = BaggingClassifier(base_estimator=clf, random_state=0)
model.fit(xtrain, ytrain)
model_pred = model.predict(xtest)
#see prediction result
print('Accuracy Score: {0:6f}'.format(model.score(xtest, ytest)))

Boosting

The most essential difference between boosting and bagging is that boosting does not treat the basic model unanimously, but selects the “elite” through continuous testing and screening. The well-performing model weights more on voting, while the poor-performing model has less power, then all votes are combined to get the final result. In most cases, the bias of the result becomes smaller after boosting. For example, Adaboost and Gradient boosting are the most commonly used models among the boosting methods.

Generally, the training set of each round maintains the same, but the distribution of each sample in the training set in the model may change at the end of the boosting round. It is an iterative procedure that focuses more on (increases weight) previously misclassified records and ignores (decreases weight) the correct ones in the previous round. In other words, it boosts the performance of a weak learner to the level of a strong one.

Unlike bagging, each prediction model can only be generated sequentially, since the parameter of the latter model requires the result of the previous model. And after aggregating the models, we use the majority voting for the classification problem and averaging for the regression problem.

Model 3:

GradientBoostingClassifier() gives us a 0.846875 accuracy score, which is also higher than the one without boosting.

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(random_state=0)
model.fit(xtrain, ytrain)
model_pred = model.predict(xtest)
#see prediction result
print(‘Accuracy Score: {0:6f}’.format(model.score(xtest, ytest)))

Stacking

It is relatively simple to average or vote on the results of the base model (weak learner), however, the learning error may be large, so a learning method, stacking, is created. Instead of doing simple logical processing on the results of the model, the strategy of stacking is to add a layer of outside the model.

Therefore, we have two layers of the model in total, that is, we build the first-layer model by predicting the training sets, then take the result of the prediction models of the training set as input, and retrain a second-layer new model to get the final result. Basically, Stacking can decrease variance and bias that bagging or boosting does.

Model 4:

The accuracy score for StackingClassifier() is 0.875000. Although it’s not the highest compare to layer 1 models, it successfully enhances the performance of Decision Tree and KNN.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingClassifier
#build two layer models using stacking
layer1 = [DecisionTreeClassifier(), KNeighborsClassifier(), RandomForestClassifier(random_state=0)]
layer2 = GradientBoostingClassifier(random_state=0)
model = StackingClassifier(classifiers=layer1, meta_classifier=layer2)
model.fit(xtrain, ytrain)
model_pred = model.predict(xtest)
#see prediction result
print(‘Accuracy Score: {0:6f}’.format(model.score(xtest, ytest)))

Conclusion

From the confusion matrix, we find out that the medium level of wine (second line) is really hard for all of the models to predict. There are 14 records belong to this category, and only 1 to 3 can be correctly predicted. However, the bad level of wine (third line) is easier to identify, most of them are true positive.

In this article, we built and used different models in Python and learned that the ensemble method makes our machine learning model more powerful. Although we are able to build models without understanding the concept behind the scene, it is still recommended to understand how each one works. Because only if knowing more about a model can we use it effectively and explain how it makes predictions correctly.

About me

Hey, I’m Kelly. I like to explore data and find interesting things in life. If you think my article is helpful, please clap for me and share it. I also welcome any feedback, comments, and constructive criticism to make my articles better. You can reach me at kelly.szutu@gmail.com

--

--

Journalist x Data Visualization | Data Analyst x Machine Learning | Python, SQL, Tableau | LinkedIn: www.linkedin.com/in/szutuct/