Multi-AgentOrchestrationCost OptimizationProduction AI

OpenClaw — AI Operations Platform

How I run 24 autonomous AI agents in production for roughly a third of cloud-only cost. A 4-tier model pyramid with automatic failover — the infrastructure my own businesses run on.

OpenClaw — AI Operations Platform preview

The Problem

I needed an AI infrastructure that could run dozens of autonomous agents reliably, cheaply, and without manual intervention. Most AI platforms assume interactive use — a human sends a message, gets a response. I needed something fundamentally different: agents that run on schedules, make decisions, escalate when needed, and deliver results to Telegram, Slack, or wherever the output needs to go.

Commercial solutions either couldn’t do this, were prohibitively expensive, or created vendor lock-in I wasn’t willing to accept.

The Architecture

OpenClaw is a 4-tier intelligence pyramid:

Tier 3: Claude Opus (~5% of calls)
   ↑ escalation
Tier 2: DeepSeek V3 (~10-15%)
   ↑ escalation
Tier 1: Gemini Flash (~15-20%)
   ↑ escalation
Tier 0: Ollama qwen3:14b (~60-70%)

Every request starts at Tier 0 (local, free) and only escalates when the task demands more capability. The routing is deterministic, not probabilistic — task types are mapped to minimum capability tiers, and the failover chain handles provider outages automatically.

Key infrastructure:

  • Auth-gated gateway on port 18789 with health monitoring and per-model usage tracking
  • LaunchAgent-managed processes (launchd on macOS) for reliability
  • Isolated sessions per agent to prevent context bleed between jobs
  • Telegram delivery pipeline for human-readable output
  • Configurable concurrency (2 agents, 4 subagents)

Key Decisions

Why local inference first: The realization that 60-70% of agent workload doesn’t need a frontier model was the most impactful cost optimization. Simple classification, template filling, reminder formatting — a 14B local model handles these with high reliability and zero cost.

Why deterministic routing over a classifier: A meta-model to classify tasks would add complexity, latency, and its own failure mode. Task-to-tier mappings are simple, stable, and debuggable. When a mapping is wrong, I change it in config. No retraining needed.

Why session isolation: Early experiments with shared sessions caused context bleed — Agent A’s context leaked into Agent B’s responses, producing confused outputs. The fix was isolation per job. More overhead, dramatically better output quality.

What I Learned

  • Token budget matters for reasoning models. qwen3:14b uses tokens for chain-of-thought reasoning. Set max_tokens below 500 and you get empty responses because the model spent its budget thinking. This took days to debug.

  • All custom providers need explicit API type configuration. A missing "api": "openai-completions" field causes a cryptic No API provider registered for api: undefined error. Both DeepSeek and Ollama need this.

  • Cron jobs using systemEvent type can’t be isolated — they must be converted to agentTurn with --message flag. This wasn’t documented and cost hours of debugging.

Results

The system has been running in production since 2024:

  • 24 autonomous agents executing daily
  • ~70% cost reduction vs. cloud-only routing
  • Zero complete downtime events
  • Thousands of daily requests processed across all tiers