Mohammed Alhamid is a software engineer and director of the Centre of Healthcare Intelligence at King Faisal Specialist Hospital and Research Centre.
Mohammed Alhamid is a software engineer and director of the Centre of Healthcare Intelligence at King Faisal Specialist Hospital and Research Centre.
Anytime we’re trying to make an important decision, we try to collect as much information as possible and reach out to experts for advice. The more information we can gather, the more we (and those around us) trust the decision-making process.
Machine learning predictions follow a similar behavior. Models process given inputs and produce an outcome. The outcome is a prediction based on what pattern the models see during the training process.
One model is not enough in many cases, and this article sheds light on this point. When and why do we need multiple models? How do we train those models? What kind of diversity should those models provide? So, let’s jump right in.
And if you want to see an example of how to build an ensemble model, you can skip to the end!
The technical challenges of building a single estimator include:
A single algorithm may not make the perfect prediction for a given data set. Machine learning algorithms have their limitations and producing a model with high accuracy is challenging. If we build and combine multiple models, we have the chance to boost the overall accuracy. We then implement the combination of models by aggregating the output from each model with two objectives:
You can implement such aggregation using different techniques, sometimes referred to as meta-algorithms.
More From Built In Experts The 7 Most Common Machine Learning Loss Functions
When we’re building ensemble models, we’re not only focusing on the algorithm’s variance. For instance, we could build multiple C45 models where each model is learning a specific pattern specialized in predicting any given thing. Models we can use to obtain a meta-model are called weak learners. In this ensemble learning architecture, the inputs are passed to each weak learner while also collecting their predictions. We can use the combined prediction to build a final ensemble model.
One important thing to mention is that weak learners can have different ways of mapping features with variant decision boundaries.
The idea of bagging is based on making the training data available to an iterative learning process. Each model learns the error produced by the previous model using a slightly different subset of the training data set. Bagging reduces variance and minimizes overfitting . One example of such a technique is the random forest algorithm.
This technique is based on a bootstrapping sampling technique. Bootstrapping creates multiple sets of the original training data with replacement. Replacement enables the duplication of sample instances in a set. Each subset has the same equal size and can be used to train models in parallel.
This technique uses a subset of training samples as well as a subset of features to build multiple split trees. Multiple decision trees are built to fit each training set. The distribution of samples/features is typically implemented in a random mode.
Here’s another ensemble technique where the predictions are combined from many decision trees. Similar to random forest, it combines a large number of decision trees . However, the extra trees use the whole sample while choosing the splits randomly.
Related Reading Implementing Random Forest Regression in Python: An Introduction
This is an ensemble of algorithms, where we build models on the top of several weak learners . As we mentioned earlier, those learners are called weak because they are typically simple with limited prediction capabilities. The adaptation capability of AdaBoost made this technique one of the earliest successful binary classifiers.
These were the core of such adaptability where each tree adjusts its weights based on prior knowledge of accuracies. Hence, we perform the training in such a technique in a sequential (rather than parallel) process. In this technique, the process of training and measuring the error in estimates can be repeated for a given number of iterations or when the error rate is not changing significantly.
Gradient boosting algorithms are great techniques that have high predictive performance. Xgboost , LightGBM and CatBoost are popular boosting algorithms you can use for regression and classification problems. Their popularity has significantly increased after their proven ability to win some Kaggle competitions .
Stacking is similar to boosting models; they produce more robust predictors. Stacking is a process of learning how to create such a stronger model from all weak learners’ predictions.
Please note that the algorithm is learning the prediction from each model (as features).
More Machine Learning What Is Deep Learning and How Does It Work?
Blending is similar to the stacking approach, except the final model is learning the validation and testing data set along with predictions. Hence, the features used are extended to include the validation set.
Classification is simply a categorization process. If we have multiple labels, we need to decide: Shall we build a single multi-label classifier? Or shall we perhaps build multiple binary classifiers? If we decide to build a number of binary classifiers, we need to interpret each model prediction. For instance, if we want to recognize four objects, each model tells you if the input data is a member of that category. Hence, each model provides a probability of membership. Similarly, we can build a final ensemble model combining those classifiers.
In the previous function, we determined the best fitting membership using the resulting probability. In regression problems, we are not dealing with yes or no questions. Instead, we need to find the best predicted numerical values and then we can average the collected predictions.
Dig Deeper With Built In Experts A Deep Dive Into Implementing Random Forest Classification in Python
When we ensemble multiple algorithms to adapt the prediction process to combine multiple models, we need an aggregating method. We can use three main techniques:
We’ll use the following example to illustrate how to build an ensemble model. We’ll use the Titanic data set and try to predict the survival of the Titanic using different techniques. A sample of the data set and the target column distribution to the passengers’ age are shown in figures six and figure seven. If you want to follow along you can find the sample code on my GitHub .
The Titanic data set is one of the classification problems that needs extensive feature engineering. Figure eight below shows the strong correlation between some features such as Parch (parents and children) with the Family Size. We will try to focus only on the model building and how the ensemble model can be applied to this use case.
We will use different algorithms and techniques; therefore, we will create a model object to increase code reusability.
Our Experts Can Point You in the Right Direction 5 Open-Source Machine Learning Libraries Worth Checking Out
Let’s combine all the model cross-validation accuracy on five folds.
Now let’s build a stacking model where a new stronger model learns the predictions from all these weak learners. Our label vector used to train the previous models will remain the same. The features are the predictions collected from each classifier.
Now let’s see if building an XGBoost model learning only the resulting prediction will perform better. But first, we will take a quick peek at the correlations between the classifiers’ predictions.
We’ll now build a model to combine the predictions from multiple contributing classifiers.
The previously created base classifiers represent Level-0 models, and the new XGBoost model represents the Level-1 model. The combination illustrates a meta-model trained on the predictions of sample data. You can see a quick comparison between the accuracy of the new stacking model and the base classifiers below:
Ensemble models are an excellent method for machine learning because they offer a variety of techniques for classification and regression problems. Now you know the types of ensemble models, how we can build a simple ensemble model and how they boost the model accuracy.
Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.