2026-01-15 · dcode · cost-optimization, llm, routing

5-Tier LLM Routing: Why Your AI Agent Doesn't Need GPT-4 for Every Task

How Klawty routes 80% of agent tasks to cheap models and reserves expensive ones for complex reasoning — cutting AI costs to $1.2/day for 8 agents.

The default is wasteful

Most agent frameworks hardcode one model. Every task — health checks, log parsing, email triage, complex analysis — gets the same $15/M-token model. This is like hiring a senior architect to file paperwork.

We ran the numbers on our production system. Of 1,000+ monthly tasks across 8 agents, 80% are routine: check a status, parse a document, update a tracker, draft a notification. These tasks don't need frontier reasoning. They need a model that's fast, cheap, and good enough.

The 5 tiers

Klawty routes every LLM call through a 5-tier system. Each tier has a primary model, a fallback, and a cost ceiling:

| Tier | Name | Primary Model | Cost/M tokens | Use Case | |------|------|--------------|---------------|----------| | 1 | Nano | Qwen 3 Flash | ~$0.07 | Health checks, status reads, log parsing | | 2 | Workhorse | Kimi K2.5 | ~$0.60 | Email triage, tracker updates, task discovery | | 3 | Capable | Claude Sonnet | ~$3.00 | Content drafting, analysis, multi-step reasoning | | 4 | Power | Claude Opus | ~$15.00 | Complex strategy, code generation, architecture | | 5 | Premium | Claude Opus (extended) | ~$15.00 | Critical decisions, production deploys, escalations |

Pattern-based escalation

The router doesn't guess. It uses three signals to pick the tier:

Task complexity — Simple reads and status checks start at Tier 1. Tasks with keywords like "analyze," "strategy," or "architecture" start at Tier 3. The task type from AGENT.md frontmatter provides the baseline.

Failure count — If a Tier 1 model fails on a task, the next attempt escalates to Tier 2. Two failures escalate to Tier 3. The system learns that this particular task needs more reasoning power.

Stakes level — Tasks tagged with tier: confirm or tier: propose in the tool definition automatically route to Tier 3+. You don't want a $0.07 model deciding whether to send a client email.

function selectTier(task, attempt) {
  let tier = task.baseTier || 1;
  // Escalate on failure
  tier = Math.min(tier + attempt, 5);
  // Floor for high-stakes tasks
  if (task.riskLevel === 'propose' || task.riskLevel === 'confirm') {
    tier = Math.max(tier, 3);
  }  return tier;
}

Daily caps

Each tier has a daily spending cap. When Tier 4 hits its $5/day limit, tasks that would route there get queued until tomorrow or downgraded to Tier 3 if they're not critical. This prevents a runaway loop from burning your monthly budget in one afternoon.

Real production numbers

Our 8-agent system running 24/7 in Luxembourg:

- Tier 1-2: 80% of all LLM calls - Tier 3: 15% of calls - Tier 4-5: 5% of calls - Average daily cost: ~$1.20 (all 8 agents combined) - Monthly total: ~€37

The insight isn't that cheap models exist. It's that routing is the architecture decision that makes autonomous agents economically viable. Without it, running 8 agents 24/7 would cost $15-25/day. With it, $1.20.