LogoAIAny
Icon for item

caveman

Compresses LLM/agent replies into a terse “caveman” style to cut output tokens (~65–75%) while preserving technical accuracy. Offers per-agent skills, intensity modes, memory-compression and middleware to lower token cost and extend usable context.

Introduction

Large-agent workflows are increasingly limited by output token cost and noisy verbosity: longer, polite answers consume context budget and billing without adding technical value. Constraining style—without removing substance—lets agents save tokens, keep more relevant context in-window, and in some benchmarks even improve correctness.

What Sets It Apart
  • Aggressive, configurable brevity: intensity levels (lite/full/ultra/wenyan) let you trade verbosity for compression without changing underlying model behavior, so you control cost vs. readability.
  • Cross-agent integration: delivered as per-agent skills/hooks and optional middleware; works with Claude Code, Codex, Gemini, Cursor and many other agent CLIs so you can enable terse replies where you already run assistants.
  • End-to-end token savings: combines output-compression, memory-file rewriting (reduces input context), and small runtime flags to produce persistent session savings—so lower per-reply cost and longer effective context.
  • Practical benchmarks and safety boundaries: repo includes measured reductions (typical 65% output token savings across examples) and rules to avoid compressing critical security or error messages, so accuracy-critical strings remain unchanged.
Who It's For and Tradeoffs

Great fit if you run cost-sensitive agent workflows, need longer effective context windows, or operate many short, technical exchanges (code review, debugging, CLI assistants). It’s also useful when you want consistent, machine-friendly one-line PR comments or commit messages.

Look elsewhere if you require full natural-language explanations by default (educational tutoring, extended prose) or if your workflow depends on human-friendly conversational niceties; extreme compression can reduce readability for non-technical audiences. Caveman compresses output text only—model reasoning tokens are unaffected.

Where It Fits

Use caveman as a low-friction layer atop existing assistants to (1) cut API spend, (2) keep memory/context payloads small, and (3) make downstream tooling (logs, diffs, CI comments) denser and easier to parse. It complements memory and fine-tuning approaches that also aim to reduce per-turn context.

How It Works

Instead of changing the model, caveman injects a lightweight skill/flag that instructs the agent to drop filler, articles and hedging while preserving code, error strings, and critical identifiers. Additional subtools rewrite memory files into compressed forms and provide middleware to shorten tool descriptions. The project ships intensity rules and guardrails so critical messages are left intact while ordinary prose is aggressively shortened.