LogoAIAny
Icon for item

Ornith-1.0-35B-GGUF

A self-improving, agentic coding LLM tailored for terminal-style coding agents and tool-calling, provided as 35B MoE GGUF weights with very large context support. Trained with reinforcement learning to jointly generate task scaffolds and solutions; designed for local inference and OpenAI-compatible tool endpoints.

Introduction

Agentic coding workflows often break where tool orchestration and incremental error recovery are left to brittle, human-designed harnesses. Ornith-1.0-35B takes a different tack: the model is trained with RL to generate not only solutions but the execution scaffold that drives tool calls and recovery steps, letting the agent discover better search trajectories and repair strategies.

Key Capabilities
  • Self-scaffolding RL: learns to propose and revise multi-step plans (scaffolds) alongside code solutions, reducing reliance on hand-crafted agent harnesses and improving end-to-end success on interactive coding benchmarks.
  • Agentic/tool integration: emits well-formed function/tool calls and exposes an OpenAI-compatible chat endpoint for seamless integration with vLLM, SGLang, MCP/agent frameworks, llama.cpp and Ollama; built-in parsers separate chain-of-thought (<think>) from final answers.
  • Practical deployment options: published as GGUF quantized weights for local inference and as bf16/FP8 checkpoints for multi-GPU serving; recommended runtimes include Transformers ≥5.8.1, vLLM ≥0.19.1, SGLang ≥0.5.9.
  • Competitive benchmarks: reports strong agentic coding performance (e.g., Terminal-Bench and SWE-bench metrics) compared to contemporaneous open-source models of similar scale.
Who it's for and tradeoffs

Great fit if you need a locally runnable, tool-capable coding agent that can autonomously plan, call tools, inspect results, and rewrite failing steps — especially within terminal-first workflows and automated agent harnesses. Look elsewhere if you need a minimal conversational assistant (the model is optimized for agentic/code tasks and emits internal reasoning traces) or if you require enterprise support SLAs; full-performance MoE variants require multi-GPU serving and careful runtime setup. The GGUF 35B build targets a balance of local usability and agentic capability but will still require infrastructure (vLLM/llama-server/Ollama) for production-scale agent fleets.