2026-03-20 · dcode · production, cost-optimization, case-study

How 8 AI Agents Run a Business for €37/Month

A breakdown of how 8 autonomous AI agents handle 200+ daily tasks for a European SMB — and how 5-tier LLM routing keeps the bill under €40/month.

The setup

Since late 2025, we've been running 8 AI agents for a design & build company in Luxembourg. Not a demo. Not a proof of concept. A production system handling real client emails, real invoices, real SEO, real leads — 200+ tasks per day with zero manual intervention.

The agents, by role:

- Orchestrator — coordination, strategic briefings, task delegation - Client operations — email triage, calendar, client tracker updates - Finance — invoice processing, overdue alerts, financial tracking - Dev & product — 8 repos, GitHub PRs, staging deploys - Marketing — SEO content, blog drafts, analytics monitoring - Safety watchdog — proposal validation, health monitoring, business rule enforcement - Business intelligence — daily briefings, weekly reports, trend detection - Sales & CRM — lead scoring, pipeline management, follow-up drafts

Total monthly AI spend: approximately €37.

A typical day

06:00 — The health monitor runs its 60-second check loop. Heartbeats, disk space, stuck tasks, DB integrity. This is tier 1 work — Qwen Flash at $0.07 per million tokens.

08:15 — Client ops wakes up. Polls Gmail, finds 6 new emails. Three are spam (auto-archived). One is a supplier confirmation (forwarded to finance). One is a client asking about delivery dates (flagged for human review — business rule: never promise dates without confirmation). One is an invoice from a subcontractor (task created for the finance agent).

09:00 — Finance picks up the invoice task. Extracts line items, cross-references the supplier database, classifies the expense, and creates a PROPOSE action to route it for payment. The safety watchdog validates: correct supplier? Amount within expected range? No duplicate? Approved — with a 15-minute rollback window.

10:30 — Marketing runs its SEO check. Pulls Search Console data, identifies 3 pages with declining impressions, drafts optimization suggestions. This hits tier 3 — content drafting needs a capable model.

14:00 — Sales scores 4 new leads from the website contact form. Two are qualified (score > 70), two are low-priority. Creates follow-up draft emails for the qualified leads — tier 3 for personalized writing. The drafts go through the proposal system: the human approves or edits before anything sends.

23:00 — The memory distiller runs. Extracts insights from the day's 200+ tasks, compresses them into persistent memory, indexes key facts in Qdrant for semantic search tomorrow.

Why it costs €37/month

The secret is 5-tier LLM routing. Not every task needs GPT-4.

| Tier | Models | Cost/M tokens | Usage | % of calls | |------|--------|---------------|-------|-----------| | 1 — Nano | Qwen Flash, Gemini Flash | $0.07 | Health checks, status updates, simple classification | ~45% | | 2 — Workhorse | Kimi K2.5, Gemini Flash | $0.27 | Email triage, routine task execution, data extraction | ~35% | | 3 — Capable | Claude Haiku, GPT-4o Mini | $1.25 | Content drafting, analysis, lead scoring | ~15% | | 4 — Power | Claude Sonnet, GPT-4o | $5.00 | Complex reasoning, multi-step proposals | ~4% | | 5 — Premium | Claude Opus, o1 | $15.00 | Strategic decisions, edge cases, novel problems | <1% |

80% of all calls hit tier 1-2. The average cost per task is under $0.005. Pattern-based routing decides the tier before the call: health check? Tier 1. Email triage? Tier 2. Content draft? Tier 3. Each tier has 3 fallback models — if the primary is down or rate-limited, it cascades automatically.

Daily caps prevent runaway spend: $3/day for nano, $5/day for workhorse, $8/day for capable, $3/day for power, $1/day for premium. If a tier hits its cap, tasks queue until the next day or escalate to a human.

The infrastructure that makes it work

Cost routing alone isn't enough. Autonomous agents will spam themselves into bankruptcy without guardrails:

4-layer deduplication — Task dedup (same title+agent within 24h), channel dedup (same message within 5 min), proposal dedup (same agent+action while pending), discovery dedup (same domain scan within cycle). Without this, agents create the same task 50 times.

Circuit breaker — If an agent fails 3 times in a row, exponential backoff kicks in. 5 minutes, then 15, then 60. Auto-resets after a successful execution. Single source of truth in SQLite — both the executor and orchestrator read the same state.

Proposal lifecycle — Risky actions don't execute immediately. They enter a 6-state machine: pending, sentinel review, approved, executing, completed (or rolled back). The safety watchdog enforces 9 business rules. The human gets a 15-minute window to override via a Discord reaction.

What this means for you

This isn't theoretical. It's been running since late 2025. The system processes invoices, triages emails, scores leads, drafts content, monitors health, and coordinates across 8 agents — all for less than what most teams spend on a single ChatGPT Plus subscription.

This is what Klawty Premium gives you: the runtime, the routing, the proposals, the dedup, the circuit breaker, the memory system. Pre-configured for your industry, or build your own team from scratch.

Get started at ai-agent-builder.ai