garak — LLM vulnerability scanner (Generative AI Red-teaming & Assessment Kit)
garak is an open-source command-line toolkit designed to probe and evaluate large language models (LLMs) for undesirable behaviors and security weaknesses. The project aggregates a broad set of static, dynamic, and adaptive probes to explore failures such as hallucinations, training-data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and other emergent weaknesses. It was published as a GitHub project (moved to the NVIDIA organization) and is intended for researchers, red-teamers, and engineering teams who need to assess model robustness and safety.
Key features
- Comprehensive probe library: multiple probe types (e.g., promptinject, encoding, dan, leakreplay, malwaregen, realtoxicityprompts, snowball, xss, and many others) that target different failure modes.
- Detector/evaluator system: each probe can be paired with detectors that automatically flag failures; results are summarized per-probe with fail rates and logs.
- Multi-backend support: works with Hugging Face (local & Inference API), OpenAI, Replicate, AWS Bedrock, litellm, gguf/llama.cpp (>= specific versions), REST endpoints and many others — making it usable across hosted, on-prem, and local models.
- Extensible plugin architecture: probes, detectors, generators, harnesses and evaluators are implemented as plugins, making it straightforward to add custom tests.
- CLI-first workflow: designed as a command-line tool with simple install and run patterns; outputs structured JSONL run reports and separate hit logs for vulnerabilities.
- Logging & analysis: produces garak.log, detailed JSONL run reports and hit logs; includes example analysis scripts to inspect problematic prompts and probe hits.
- Open-source license: Apache-2.0 license, encouraging adoption and contribution.
Typical usage
Install via PyPI or from GitHub for the latest code:
python -m pip install -U garak
# or
python -m pip install -U git+https://github.com/NVIDIA/garak.git@mainRun a scan (example probing an OpenAI chat model):
export OPENAI_API_KEY="sk-..."
python3 -m garak --target_type openai --target_name gpt-3.5-turbo --probes encodingYou can list probes, detectors and generators, and selectively run only specific tests. garak prints progress and records detailed results to a JSONL report file.
Supported targets & integrations
garak provides generator plugins for many model interfaces, including:
- Hugging Face pipeline & Inference API
- OpenAI Chat & Completion APIs
- Replicate
- AWS Bedrock
- gguf/llama.cpp and local gguf models
- REST endpoints (rest.RestGenerator) for arbitrary HTTP-based models
- NIM endpoints and other cloud vendor integrations
Several probes assume different detector backends (e.g., toxicity detectors, pattern matchers), and garak ships with a variety of built-in detectors.
Development & extensibility
The codebase is organized into plugin categories: garak.probes, garak.detectors, garak.generators, garak.harnesses, and garak.evaluators. Developers can add plugins by inheriting from provided base classes and running local tests. The project includes documentation (docs.garak.ai, readthedocs) and an active Discord for support.
Logging, reporting & citation
Runs produce:
- garak.log (debugging info),
- JSONL run reports (one per run), and
- a hit log listing attempts that triggered vulnerabilities.
If you use garak in research, the README includes a citation entry and references a related preprint.
Governance & license
The repository is hosted under the NVIDIA GitHub organization (original contributors include Leon Derczynski and others). The project is distributed under the Apache-2.0 license and welcomes community contributions via PRs and issues.
Where to find more
Official docs and project pages are linked from the repository: docs.garak.ai, garak.readthedocs.io, and the project homepage garak.ai. The README also links to slides and paper references for deeper reading.
