LogoAIAny
Icon for item

Rio 3.5 Open 397B

A post-trained Mixture-of-Experts multimodal LLM with ~397B total (≈17B active) and a 1,010,000-token context for image-text-to-text and conversational tasks. Integrates SwiReasoning to switch between latent and explicit reasoning; MIT-licensed and optimized for Portuguese/English research and on-prem inference.

Introduction

Why this matters Rio 3.5 Open 397B demonstrates that a municipally-developed, openly licensed model can push open-model capabilities by combining MoE sparsity, extreme context length, and a training-free inference strategy (SwiReasoning). That combination makes it practical to run longer multimodal dialogues and large reasoning traces without committing every intermediate step to tokens, which improves token efficiency and accuracy on many benchmarks compared with its base Qwen 3.5 397B.

Key Capabilities
  • Large-context multimodal reasoning — a 1,010,000-token window enables multi-document workflows, long codebases, and extended image+text conversations; so what: it reduces the need for ad-hoc chunking and external retrieval in long-form tasks.
  • MoE architecture with ~397B total / ~17B active parameters — so what: you get frontier-scale capacity on selective computation, but inference requires MoE-aware serving (tensor parallelism, expert routing).
  • SwiReasoning integration — dynamic switching between explicit chain-of-thought and latent-space reasoning guided by entropy; so what: higher accuracy under unconstrained budgets and substantially fewer emitted tokens when budget-constrained.
  • Multilingual & multimodal focus — post-trained from Qwen 3.5 397B with evaluated gains across coding, math, and multilingual benchmarks; so what: particularly useful for Portuguese-first deployments while retaining broad language coverage.
Who it's for and trade-offs

Great fit if you need long-context multimodal capabilities for research or on-prem deployment, can provision MoE-capable inference (vLLM, SGLang, or multi-GPU clusters), and prefer an MIT-licensed model for commercial use. Look elsewhere if you require ultra-low-latency single-GPU inference, edge deployment, or minimal infra complexity — MoE and million-token contexts substantially increase memory, TPU/GPU requirements, and engineering complexity. Also consider data provenance and evaluation needs: open benchmarks shown in the model card are useful but you should validate behavior on your domain data.

Where it sits

Compared with its base (Qwen 3.5 397B) the model reports consistent gains across coding, math, and multilingual suites thanks to post-training and SwiReasoning; compared to closed commercial LLMs it narrows gaps on many benchmarks while remaining fully MIT-licensed—trading simpler hosting for heavier inference infrastructure.

Brief note on usage

The model provides example snippets for transformers, vLLM, and SGLang; production use typically requires MoE-aware tensor-parallel serving and attention to context-memory planning. Because SwiReasoning is an inference-level strategy, you can experiment with it without retraining but should test cost/latency trade-offs in your serving stack.

Information

  • Websitehuggingface.co
  • AuthorsIplanRIO (prefeitura-rio)
  • Published date2026/06/11

Categories