Firstly, Ensemble Methods is an machine learning method which combines various models to produce one optimal predictive model. Moreover, the main principle of ensemble methods is to combine weak and strong learners to form strong and versatile learners. We need to get an accurate and versatile model.
In addition, Bagging and Bootstrap both method calculate accuracy of the model.
Bagging in Ensemble Methods
- Using Bagging decision tree
- Using Random Forest method
Firstly, Bagging performs best with algorithms that have high variance. Secondly, one popular example are decision trees. Moreover, Random forest is an extension of bagged decision trees.
Steps:
- For instance, a data set has n observations and m features in training set. Take a sample from training data.
- Subset of M gets select and there will be split function applied.
- The tree is grown to the largest.
- Repeat these above steps n times.
Advantages:
- Reduction of over fitting models.
- Accuracy is there for missing data.
- This method works well for high dimensional data
Disadvantage
- It will not give precise values for regression and classification models.
Boosting in Ensemble Methods
- AdaBoost
- Stochastic Gradient Boosting
AdaBoost
It generally works by weighting instances in the data set by how easy or difficult they are to classify. Moreover, allowing the algorithm to pay or or less attention to them in the construction of subsequent models.
Stochastic Gradient Boosting
You can construct a Gradient Boosting model for classification using the GradientBoostingClassifier class.
Steps:
- Draw a random set of training samples d from training set D
- Now draw another random set of training samples d from training set D
- Add 50 percent of samples that were falsely classified to train a weak leaner C.
- Combine all weak learners via majority
Advantages of Boosting
- Supports different loss function
- This algorithm works nicely with interactions.
Disadvantages of Boosting
- Firstly, it is prone to over fit.
- Secondly, it requires careful tuning of various hyper parameters.
To find the accuracy, the formula is as follows:
Accuracy = (True Positive + True Negative)/ Total Population
Conclusion
In conclusion, we have learnt that Ensemble Methods is an machine learning method which combines various models to produce one optimal predictive model. In addition, the main principle of ensemble methods is to combine weak and strong learners to form strong and versatile learners.
We have also learnt about bagging and boosting algorithms. Bagging performs best with algorithms that have high variance. Boosting supports different loss functions and works well with interactions. Moreover, we have seen some of its advantages and disadvantages.