Overview
XGBoost (eXtreme Gradient Boosting) is a high-performance, open-source library that implements gradient-boosted decision trees for supervised learning tasks such as classification, regression and ranking. Designed for both single-machine and distributed environments, it achieves state-of-the-art results on tabular data by combining:
- Optimized tree-based algorithms – exact and approximate split finding, sparsity-aware learning and advanced regularization.
- Hardware acceleration – built-in GPU support and efficient multithreading to fully exploit modern CPUs and GPUs.
- Distributed training – integration with Dask, Spark and Ray for scaling to large clusters and cloud environments.
- Flexible interfaces – native APIs for Python, R, C++, Java, Scala and Julia, plus scikit-learn compatibility for seamless pipeline integration.
- Model interpretability tools – built-in feature importance, SHAP value computation and visualization utilities.
First released in 2014 by Tianqi Chen at the University of Washington’s DMLC group, XGBoost quickly became a dominant choice in data-science competitions and industry production systems for its blend of speed, accuracy and configurability. Today it is maintained by an active open-source community and remains a cornerstone technique for tabular machine-learning workflows.