Formal verification workflows often require long-horizon interaction with a proof assistant, iterative file edits, and real-time compiler feedback — generalist LLMs struggle to operate reliably at that granularity. Leanstral 1.5 is purpose-trained to act as an agent inside Lean 4 projects and the Mistral "vibe" environment, closing that gap by learning the full proof-engineering loop rather than only producing single-shot proofs.
Key Capabilities
- Trained for agentic proof engineering: optimized to edit files, run shell commands, query the Lean language server, and iterate until proofs compile.
- Architecture and scale: MoE with 128 experts (4 active per token), 119B parameters with ~6.5B activated per token, and a very long context window (up to 256k tokens) to handle whole-repo reasoning.
- Multimodal and integrable: accepts text and images, designed to run via Mistral Vibe or a local vLLM server and to work with MCPs like lean-lsp-mcp for tight IDE-style workflows.
- Empirical performance: achieved state-level results on formal benchmarks (saturated miniF2F, high scores on PutnamBench and FATE suites) and has been used in automated pipelines that flagged real bugs in Rust→Lean translated code.
Who it's for and trade-offs
Great fit if you need an AI that can be embedded into Lean 4 development workflows to automate proof attempts, triage failing goals, or perform long-running repository tasks. It is especially useful for teams that can run or access Mistral's API or provision sizable inference infrastructure (vLLM, tensor-parallel serving). Look elsewhere if you need a tiny, low-cost model for casual chat or general-purpose code help: Mixture-of-Experts models add serving complexity and local deployment requires significant memory, tensor-parallelism and engineering (or reliance on Mistral-hosted endpoints). The model's Apache-2.0 license makes it permissive for integration, but operational costs and MoE inference demands are the main practical constraints.
Where it fits
Use Leanstral 1.5 when you want a specialist proof agent rather than a general LLM: it targets reproducible, compile-driven proof loops and repository-scale formal verification. For lighter-weight code-assistant tasks or non-formal coding workflows, smaller single-GPU models or hosted chat-centric agents remain simpler and cheaper.
