Performs one-shot, long-horizon OCR and document parsing by using Reference Sliding Window Attention (R-SWA) to keep the decoder KV cache constant, enabling single-pass multi-page transcription; code, model weights and an accompanying arXiv report are provided.
Provides GGUF-quantized weights and runtime assets for running the Qwythos-9B reasoning LLM locally via llama.cpp and compatible runtimes. Key features include 1,048,576-token YaRN long-context, native function-calling, multimodal image input (requires mmproj), and multiple quantization/MTP variants tuned for different size/quality tradeoffs.
NVFP4-quantized variant of Qwen3.6-27B that reduces parameter bits from 16 to 4, cutting disk and GPU memory requirements by ~2.5× while keeping comparable benchmark accuracy; ready for vLLM-based inference on NVIDIA hardware and supports long, multimodal contexts.
Thinking-off fine-tune for coding-agent workflows that prioritizes fast next-step decisions, lower token usage and stable multi-turn tool calling. Highlights: MoE 35B base, MTP speculative decoding, SWE-bench 62.4% (300 cases). Best for local agent loops and automated debug cycles; requires disciplined harnessing and schema consistency.