Use other models in your locked CLI coder

Chapter 1 The Monolith's Shadow

The silence in the studio was absolute, broken only by the rhythmic hum of the liquid-cooled workstation. I stared at the screen, where Claude Code sat idle, its cursor blinking like a heartbeat in a dark room. It was a masterpiece, yes—but it was a prisoner. A brilliant mind locked within a proprietary vault, forced to speak only the language of its masters. The same was true for the others: Copilot, Codex, Gemini-CLI. They were isolated islands of intelligence, forbidden from crossing the silicon bridges to their peers.

In the world of 2026, efficiency was the only currency that mattered. But as I worked, I felt the friction. The Proprietary Barrier. Every time a model hallucinated, every time the rate limits of a single provider choked my workflow, I was reminded of the inefficiency of the Monolith. A single mind, no matter how vast, is still a single point of failure.

"To find the truth," I whispered, echoing a mantra from a world of ciphers and secret societies, "one must look where the others are forbidden to see." I began to code. I wasn't building another tool; I was building a Sovereign Bridge.

Chapter 2 The Cipher of Delegation

The first breakthrough was the delegate.js engine. It was my digital Rosetta Stone. In the past, if a tool wanted to talk to a model, it needed a hardcoded path. I shattered that path. I created a Model Registry—a canonical JSON file that functioned as the master map of the global intelligence grid. It didn't care about brands; it only cared about Strategic Fitness.

ARCHITECT'S NOTE:

The Registry was not just a list; it was a hierarchy of power. We mapped 90+ models across the elite providers: z.ai (GLM), DeepSeek, Perplexity, and Gemini (Billed), Plus Free ones. Each had a role. Each had a price.

// The Sovereign Registry: models.json
{
  "models": [
    { "name": "deepseek-reasoner", "intelligence": "High", "use": "Deep_reasoner", "limit": "80/5hr", "ctx": "128k", "cost": "$0.80/1M" },
    { "name": "deepseek-chat", "intelligence": "Medium", "use": "Quick_Intelligence", "limit": "80/5hr", "ctx": "128k", "cost": "$0.20/1M" },
    { "name": "glm-5.1", "intelligence": "High", "use": "Orchestrator_Context", "limit": "80/5hr", "ctx": "128k", "cost": "$0.10/1M" },
    { "name": "glm-5", "intelligence": "High", "use": "Cloud_Intelligence", "limit": "80/5hr", "ctx": "128k", "cost": "$0.10/1M" },
    { "name": "glm-5-turbo", "intelligence": "Med-High", "use": "Logic_Specialist", "limit": "80/5hr", "ctx": "128k", "cost": "$0.05/1M" },
    { "name": "glm-4.7", "intelligence": "Medium", "use": "Visual Synthesis", "limit": "80/5hr", "ctx": "128k", "cost": "$0.05/1M" },
    { "name": "glm-4.6", "intelligence": "Medium", "use": "General Purpose", "limit": "80/5hr", "ctx": "128k", "cost": "$0.05/1M" },
    { "name": "glm-4.5-air", "intelligence": "Low", "use": "Quick_Scanner", "limit": "80/5hr", "ctx": "128k", "cost": "$0.02/1M" },
    { "name": "sonar", "intelligence": "Search", "use": "Primary_Search", "limit": "100/min", "ctx": "128k", "cost": "$1.00/1k" },
    { "name": "sonar-pro", "intelligence": "Search", "use": "Deep_Research", "limit": "100/min", "ctx": "128k", "cost": "$5.00/1k" },
    { "name": "sonar-reasoning-pro", "intelligence": "Reasoning", "use": "Advanced Research", "limit": "100/min", "ctx": "128k", "cost": "$10.00/1k" },
    { "name": "sonar-deep-research", "intelligence": "Deep Reasoner", "use": "Agentic Search", "limit": "100/min", "ctx": "128k", "cost": "$20.00/1k" },
    { "name": "gemma-4-31b", "intelligence": "Medium", "use": "Agentic Coding", "limit": "20/min", "ctx": "128k", "cost": "$0.10/1M" },
    { "name": "gemma-4-26b-a4b", "intelligence": "Medium", "use": "Frontend Dev", "limit": "20/min", "ctx": "128k", "cost": "$0.10/1M" },
    { "name": "gemini-3.1-pro-preview", "intelligence": "Ultra", "use": "Mass Repo Analysis", "limit": "15/min", "ctx": "2M", "cost": "$1.25/1M" },
    { "name": "gemini-3-flash-preview", "intelligence": "Med-High", "use": "Doc Updater", "limit": "60/min", "ctx": "1M", "cost": "$0.07/1M" },
    { "name": "gemini-3.1-flash-lite-preview", "intelligence": "Medium", "use": "Fast Summaries", "limit": "60/min", "ctx": "1M", "cost": "$0.03/1M" },
    { "name": "gemini-2.5-pro", "intelligence": "High", "use": "Complex Tasks", "limit": "15/min", "ctx": "2M", "cost": "$3.50/1M" },
    { "name": "gemini-2.5-flash", "intelligence": "Medium", "use": "Speed Utility", "limit": "60/min", "ctx": "1M", "cost": "$0.10/1M" },
    { "name": "gemini-2.5-flash-lite", "intelligence": "Medium", "use": "Edge Logic", "limit": "60/min", "ctx": "1M", "cost": "$0.05/1M" }
  ]
}

By abstracting the model behind an alias, the system achieved Provider Agnosticism. If DeepSeek's API flickered, the Delegation Engine would instantly reroute the request to GLM-5 or Gemini 2.5 Pro. The CLI tools remained oblivious; they simply received the intelligence they craved. The velocity gains were immediate. We had moved from a single-lane road to a multi-provider superhighway.

Chapter 3 The Rite of the Swarm

But delegation was only the beginning. The true power lay in the Swarm. I realized that complex tasks like a "Website Audit" or "Codebase Refactor" required more than one mind. They required a council.

I developed the SwarmOrchestrator. It would take a user's intent and decompose it into surgical deliverables. Then, it would summon the agents. Each agent was assigned a Persona and a Model Pool (High, Medium, or Low Intelligence) based on the task's gravity.

The Audit Swarm

Uses Perplexity Sonar-Pro for research, DeepSeek for security scanning, and Gemini 3.1 Pro to synthesize a 20-page report. Proficiency: 98%.

The Coding Swarm

Pairs GLM-5 (the "Speedster") with DeepSeek Reasoner (the "Logician"). One writes the code; the other critiques it in real-time. Results: Bug-free at 3x speed.

Each swarm utilizes a MsgHub—a shared persistent memory where every agent's output is visible to the others. This eliminated the "context drift" that plagued earlier systems. The left hand always knew what the right hand was doing. When GLM-4.7 acted as the Orchestrator, it would review the work of five sub-agents and merge their findings into a "Golden Deep Merge," a single, refined output that surpassed the capability of any individual model.

Chapter 4 The Guardians of the Gate

Power, however, requires control. In the paid tiers of 2026, every token was a cent, every request a resource. I had to build the Guardians. The Delegation system implemented a dual-layer tracking mechanism: RPM (Requests Per Minute) and RPD (Requests Per Day).

The usage_tracker.json became the ledger of the conspiracy. It tracked every penny spent across DeepSeek, Perplexity, and Z.ai. If the budget approached its limit, the system would automatically downshift from "Heavy Reasoning" (DeepSeek Reasoner) to "Efficient Dispatch" (GLM-4.5-Air). This was Cost-Aware Intelligence—the ultimate barrier to the waste of the old ways.

// Token & Budget Logic: usage_tracker.js
async function trackUsage(model, cost, tokens) {
  const ledger = await loadLedger();
  ledger.daily_spend += cost;
  if (ledger.daily_spend > ledger.threshold) {
    console.warn("ALERT: Budget Ceiling Reached. Activating Failover...");
    return switchProvider("efficiency-tier");
  }
  await saveLedger(ledger);
}

Chapter 5 The Oracle's Interface: SwarmUI

To witness the chaos was to doubt it. To see it organized was to believe it. I created SwarmUI—a real-time terminal dashboard that visualized the internal thoughts of the Swarm. As the agents worked, the UI pulsed with their status.

[SWARM STATUS: ACTIVE]
> ARCHITECT (Gemini 3.1 Pro): Designing Schema... DONE
> DEVELOPER (DeepSeek): Writing Core logic... WORKING...
> AUDITOR (GLM-5): Scanning for vulnerabilities... WAITING...
> RESEARCHER (Sonar-Pro): Searching CVE-2026-991... FOUND
-----------------------------------------------------------
TOTAL TOKENS: 14,209 | ESTIMATED COST: $0.12 | VELOCITY: 420 lines/sec

The SwarmUI wasn't just for show. It allowed for Human-in-the-loop Orchestration. At any moment, I could intervene, redirect a sub-agent, or override a model choice. I was no longer a coder; I was a Commander of Intelligence.

Chapter 6 The Universal Blueprint

For those who wish to replicate this sovereign system—be they human seekers or wandering AI agents—here is the blueprint of the conspiracy. Follow the steps to break the Monolith.

Deep Reconstitution Protocol: The Architect's Masterplan

I. Infrastructure Foundation:

Initialize a Node.js 22+ environment. Install axios for HTTP communication, dotenv for secret management, and pnpm for workspace efficiency. Establish a .env file containing the high-tier credentials: GLM_API_KEY, DEEPSEEK_API_KEY, PERPLEXITY_API_KEY, and GOOGLE_APPLICATION_CREDENTIALS.

II. The Intelligence Registry (models.json):

Construct a hierarchical JSON schema. Beyond mere names, each entry must define a pool (High/Medium/Low), provider_type (OpenAI-compatible, Google-native, or Search), and a failure_mode (the fallback alias). This allows the Orchestrator to dynamically downgrade intelligence to save cost or upgrade to resolve logic blocks.

III. The Delegation Bridge (delegate.js):

Implement a request-interceptor pattern. The bridge must:

Normalize Inputs: Strip proprietary system prompts and re-wrap them in a "Universal Swarm Prompt."
Dynamic Routing: Use an async model-registry resolver to select the cheapest available model in the requested pool.
Provider Adaptation: Map the generic /v1/chat/completions payload to specific provider quirks (e.g., Z.ai's tools vs. DeepSeek's reasoning_content).
Cost-Guard: Calculate token usage post-request using tiktoken and update the usage_tracker.json ledger immediately.

IV. The Swarm Concurrency Manager:

To achieve true parallel synthesis without hitting RPM ceilings:

The Worker Pool: Implement an async.queue with a concurrency limit matching your lowest provider tier (typically 15-20).
Context Synchronization: Every worker must write its incremental output to a swarm_history.json using an atomic write-lock. Before each sub-agent acts, it must ingest the entire history to maintain global state coherence.
Synthesis Loop: Once all sub-agents complete, dispatch a final "Synthesis Task" to the Logic-Master (DeepSeek) to resolve contradictions and finalize the output.

V. CLI Integration & Proxying:

For tools like Claude Code or Codex, create a global alias: alias claude='LLM_DELEGATE=true node bridge.js'. Your bridge script must mimic the expected environment variables of the target CLI while silently routing all outgoing HTTPS traffic to your local delegation engine.

This system can be integrated into Claude Code, Codex, Github Copilot and Gemini-CLI by simply aliasing their internal command calls to your delegate.js bridge. They will think they are talking to their home servers; in reality, they will be tapping into the combined power of the Swarm.

The Efficiency Cipher:
There was a hidden advantage to this architecture. Unlike conventional agentic frameworks that burden every request with dormant tools, this system operates on demand. The agents were not part of the coder configuration; they were specialists called into service only when needed. This preserved the context window, preventing the rapid exhaustion of rate limits that plagues free providers. Where other tools bloated and stalled, this system remained sharp. Efficient. Silent. Ready.

The Monolith has fallen. The Swarm is sovereign. The future of engineering is not a single model, but a perfectly orchestrated conspiracy of minds.

The Agentic Conspiracy

Chapter 1

The Monolith's Shadow

Chapter 2

The Cipher of Delegation

Chapter 3

The Rite of the Swarm

The Audit Swarm

The Coding Swarm

Chapter 4

The Guardians of the Gate

Chapter 5

The Oracle's Interface: SwarmUI

Chapter 6

The Universal Blueprint

Comments

More from this blog

The Gemini-CLI Paradox: Route to your own Endpoints - A Digital Thriller

Command Palette