North Mini Code aims to make agentic coding and complex terminal workflows practical with an open-weights sparse Mixture-of-Experts model. The core insight is that a large sparse model (30B total, 3B active) plus tooling-aware training lets an LLM both generate large codebases and reason across extremely long contexts (256K tokens), which matters for multi-file engineering, long traces, and stepwise tool interactions.
Key Capabilities
- Sparse MoE architecture: 30B total parameters with ~3B active per token via 128 experts (8 active) — trades inference peak size for lower per-token compute. This means better capacity-per-flop for complex code reasoning while keeping runtime costs closer to a smaller dense model.
- Long-context & long-output support: Designed for up to 256K context and very long outputs (model card references 64K output), enabling multi-file code generation, long conversational histories, and agentic tool chains without frequent context truncation.
- Tool-use and agent support: Trained and benchmarked for agentic coding (terminal/agent benchmarks), includes chat templates and function-calling examples, and integrates with vLLM and OpenCode for local tool-enabled deployments.
- Research-friendly release: Weights and model card are published under Apache-2.0 with evaluation details and recommended sampling parameters, easing replication and local experimentation.
Who it's for — and tradeoffs
Great fit if you need a locally runnable, research-accessible model for multi-step coding tasks, automated agent workflows, or experiments that require very long context windows. It's especially useful for teams prototyping tool-enabled agents (terminal automation, code synthesis across many files) and for benchmarks comparing sparse vs dense scaling.
Look elsewhere if you need a turnkey managed API with production SLAs, or if your deployment environment cannot accommodate MoE routing complexity or tooling (some runtimes require vLLM/melody or transformer forks). Expect additional engineering for efficient inference (device mapping, vLLM, or specialized runtimes) compared with standard dense models.
Where it fits
Positioned between research-oriented open-weight models and production-focused closed APIs: it targets practitioners who want high-capacity code reasoning and tool-use with reproducible results, without depending on a hosted provider. Compared to dense models of similar active parameter count, it aims to provide higher effective capacity for coding and agentic tasks at comparable inference cost.
