Ensemble Methods in Machine Learning

Firstly, Ensemble Methods is an machine learning method which combines various models to produce one optimal predictive model. Moreover, the main principle of ensemble methods is to combine weak and strong learners to form strong and versatile learners. We need to get an accurate and versatile model.

In addition, Bagging and Bootstrap both method calculate accuracy of the model.

Bagging in Ensemble Methods

Using Bagging decision tree
Using Random Forest method

Firstly, Bagging performs best with algorithms that have high variance. Secondly, one popular example are decision trees. Moreover, Random forest is an extension of bagged decision trees.

Steps:

For instance, a data set has n observations and m features in training set. Take a sample from training data.
Subset of M gets select and there will be split function applied.
The tree is grown to the largest.
Repeat these above steps n times.

Advantages:

Reduction of over fitting models.
Accuracy is there for missing data.
This method works well for high dimensional data

Disadvantage

It will not give precise values for regression and classification models.

Boosting in Ensemble Methods

AdaBoost
Stochastic Gradient Boosting

AdaBoost

It generally works by weighting instances in the data set by how easy or difficult they are to classify. Moreover, allowing the algorithm to pay or or less attention to them in the construction of subsequent models.

Stochastic Gradient Boosting

You can construct a Gradient Boosting model for classification using the GradientBoostingClassifier class.

Steps:

Draw a random set of training samples d from training set D
Now draw another random set of training samples d from training set D
Add 50 percent of samples that were falsely classified to train a weak leaner C.
Combine all weak learners via majority

Advantages of Boosting

Supports different loss function
This algorithm works nicely with interactions.

Disadvantages of Boosting

Firstly, it is prone to over fit.
Secondly, it requires careful tuning of various hyper parameters.

To find the accuracy, the formula is as follows:

Accuracy = (True Positive + True Negative)/ Total Population

Conclusion

In conclusion, we have learnt that Ensemble Methods is an machine learning method which combines various models to produce one optimal predictive model. In addition, the main principle of ensemble methods is to combine weak and strong learners to form strong and versatile learners.

We have also learnt about bagging and boosting algorithms. Bagging performs best with algorithms that have high variance. Boosting supports different loss functions and works well with interactions. Moreover, we have seen some of its advantages and disadvantages.