LogoAIAny
Icon for item

LongCat-2.0

A large-scale MoE language model for agentic coding and long-context tasks, natively supporting 1M-token context and dynamically activating tens of billions of parameters per token. Uses sparse attention and zero-computation experts to allocate compute per-token; model weights planned for release.

Introduction

Why this matters

Training and serving trillion-parameter MoE models with million‑token context changes how we approach multi-step coding and agent workflows: instead of chopping inputs or streaming state externally, you can operate over much larger local contexts and assign compute per token. LongCat-2.0 is positioned as a practical exploration of that tradeoff—pushing MoE scale, sparse long-context attention, and token-level dynamic compute on domestically produced AI ASIC clusters.

Key Capabilities
  • Massive MoE scale with dynamic activation — ~1.6 trillion total parameters with tens of billions of parameters activated per token (reported around ~48B on average, with dynamic ranges reported elsewhere). So what: lets the model devote far more capacity to complex tokens while saving compute on simple tokens, improving efficiency on heterogeneous workloads.
  • Native 1M-token context with LongCat Sparse Attention (LSA). So what: enables single-context reasoning over million‑token inputs for tasks like long-form codebases, multi-file reasoning, or agent traces without external stitching.
  • Zero-computation experts and ScMoE cross-layer shortcuts. So what: provides a mechanism to bypass compute for trivial tokens and fuse expert behaviors across layers, making token-level compute budgeting more flexible for agentic and coding uses.
  • Engineered for domestic AI ASIC superpods and large-scale training. So what: demonstrates large-scale training & inference workflows on an alternative hardware stack (reported multi-million accelerator-hours and >35T tokens pretraining), which may reduce dependency on other accelerator ecosystems.
Who it's for and trade-offs

Great fit if you need to prototype or evaluate: large-context agent workflows, multi-file code understanding and generation, or agent infrastructure that benefits from token-level compute allocation. It’s also relevant to engineers exploring MoE routing, sparse long-context attention, and alternate hardware stacks.

Look elsewhere if you need immediately downloadable weights and lightweight local deployment: the project notes that model weights and full tooling are to be released (weights were marked "coming soon" on the model card). Expect high inference costs and specialized runtime requirements for full-scale variants; smaller distilled or quantized variants may be more practical for short-term experimentation.

Where it fits

LongCat-2.0 sits among recent trillion‑scale MoE and long‑context efforts as a model focused on agentic coding and execution. Its distinguishing engineering choices are token-level compute control (zero-compute experts), linear sparse attention for million‑token windows, and an emphasis on domestic ASIC training/inference stacks rather than conventional GPU-first pipelines.

Information

Categories