Purpose
CompassRank is the public leaderboard of the OpenCompass evaluation suite. It offers a reproducible, fully open pipeline that tests large language and multimodal models on >70 benchmarks (~400 k questions) covering knowledge, reasoning, coding, mathematics and instruction following.
Key modules
- Distributed evaluator – one-command sharding for billion-parameter models.
- Diversified paradigms – zero-shot, few-shot and chain-of-thought templates.
- Experiment management – YAML configs + real-time result reporting.
Community & openness
All configs, datasets and reports are Apache-2.0 licensed; contributors can add new models or benchmarks via pull request.