scikit-learn (also known as sklearn) is a BSD-licensed, community-driven machine-learning library for Python.
Built on top of NumPy, SciPy and joblib, it delivers:
- A wide catalogue of supervised and unsupervised algorithms – e.g. SVMs, random forests, gradient boosting, k-means, DBSCAN, Gaussian processes and manifold learning.
- Consistent, estimator-centric API (
fit / predict / transform
) with rich Pipeline and FeatureUnion utilities for end-to-end workflows. - Tools for model-selection and hyper-parameter optimisation such as grid search, randomised search, cross-validation and permutation tests.
- Comprehensive metrics for classification, regression, clustering and ranking, plus visualisers like
plot_partial_dependence
. - Out-of-core learning interfaces via
partial_fit
, multi-processing with joblib, and seamless interoperability with the Python scientific stack. - Extensive tutorials, narrative docs and example gallery, making it a de-facto teaching and prototyping standard in academia and industry.
Released under the permissive BSD licence, scikit-learn is production-ready and supported by NumFOCUS and Inria, while remaining fully open to community contributions.