<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Haryn.us]]></title><description><![CDATA[AI engineering, vibe coding journey: Designing workflows, tools and solutions for Agentic Coding]]></description><link>https://blog.haryn.us</link><image><url>https://cdn.hashnode.com/uploads/logos/69e24585fd22b8ad623ceec2/e411fc7f-1032-4fec-9207-1ba9065d71d6.webp</url><title>Haryn.us</title><link>https://blog.haryn.us</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 22:53:29 GMT</lastBuildDate><atom:link href="https://blog.haryn.us/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[The Gemini-CLI Paradox: Route to your own Endpoints - A Digital Thriller]]></title><description><![CDATA[By: Haryn.us

Prologue: The Silent Timeout
11:47 PM. Somewhere in the digital ether.
The cursor blinked. Once. Twice. A metronome counting down patience.
On the screen, a single line of PowerShell awa]]></description><link>https://blog.haryn.us/the-gemini-cli-paradox-route-to-your-own-endpoints-a-digital-thriller</link><guid isPermaLink="true">https://blog.haryn.us/the-gemini-cli-paradox-route-to-your-own-endpoints-a-digital-thriller</guid><category><![CDATA[gemini cli]]></category><category><![CDATA[AI]]></category><category><![CDATA[gemini]]></category><category><![CDATA[Google]]></category><dc:creator><![CDATA[haryn]]></dc:creator><pubDate>Fri, 17 Apr 2026 16:57:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69e24585fd22b8ad623ceec2/6cb6711b-3dfa-473a-80f3-0577d7ba6c39.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By: <a href="http://Haryn.us">Haryn.us</a></p>
<hr />
<h3>Prologue: The Silent Timeout</h3>
<p><em>11:47 PM. Somewhere in the digital ether.</em></p>
<p>The cursor blinked. Once. Twice. A metronome counting down patience.</p>
<p>On the screen, a single line of PowerShell awaited execution. Behind it, a labyrinth of failed attempts, contradictory documentation, and an AI assistant that had—more than once—apologized for leading its human counterpart down rabbit holes of impossibility.</p>
<p>The goal was simple: Make the CLI work when the servers say no.</p>
<p>The obstacle was elegant in its cruelty: Google's own infrastructure, designed to protect itself, had become the very wall that needed scaling.</p>
<p>What followed was not a hack. Not a workaround. But a revelation—a discovery that the solution had been hiding in plain sight, whispered by the enemy itself.</p>
<p>This is that story. And yes—you can replicate it.</p>
<hr />
<h3>Chapter 1: The Capacity Cipher</h3>
<p>It began, as many digital odysseys do, with an error message.</p>
<p><code>[ERROR] Model over capacity. Please try again later.</code></p>
<p>For our protagonist—a developer whose workflows, pipelines, and professional identity were intertwined with <code>gemini-cli</code>—this was not an inconvenience. It was an existential threat. Requests that once completed in seconds now languished for hours. The free credits of a Google One AI subscription, meant to empower, now taunted from behind a velvet rope of rate limits.</p>
<p>The first instinct: Ask the system itself for help.</p>
<p>Using Google's own Gemini Advanced, the query was posed: <em>"How do I bypass capacity restrictions on gemini-cli?"</em></p>
<p>The response was paradoxical, almost poetic:</p>
<blockquote>
<p>"Consider using a proxy layer like LiteLLM to route requests through alternative endpoints while maintaining the same interface..."</p>
</blockquote>
<p>The enemy had handed us the key. We just didn't know which lock it opened.</p>
<hr />
<h3>Chapter 2: The False Paths</h3>
<p><em>Every great discovery is preceded by a series of elegant failures.</em></p>
<p>Our journey was no exception. The AI —let's call it The Synthetic Assistant—proposed solutions that sounded plausible but crumbled under scrutiny:</p>
<ul>
<li><p><strong>The Web Interface Proxy:</strong> <em>"Automate the browser to use the web UI!"</em><br /><strong>Reality:</strong> Terms of Service violations, fragile selectors, and session token nightmares.</p>
</li>
<li><p><strong>The OAuth Dance:</strong> <em>"Just switch authentication methods!"</em><br /><strong>Reality:</strong> <code>gemini-cli</code> ignored session environment variables, preferring persistent user settings buried in <code>%APPDATA%</code>.</p>
</li>
<li><p><strong>The API Key Illusion:</strong> <em>"Use a free tier API key!"</em><br /><strong>Reality:</strong> The free tier had ended. The $10/day charges loomed.</p>
</li>
</ul>
<p>Each dead end taught a lesson: <em>The system is not broken. It is behaving exactly as designed. To succeed, we must work with its design, not against it.</em></p>
<hr />
<h3>Chapter 3: The LiteLLM Revelation</h3>
<p>The breakthrough came not from fighting the architecture, but from <em>understanding it.</em></p>
<p><code>LiteLLM</code> is not a <em>hack</em>. It is a router. A sophisticated traffic director that sits between your CLI and multiple AI providers, translating requests on the fly. The architecture became clear:</p>
<pre><code class="language-plaintext">[gemini-cli] 
     ↓ (sends request to "gemini-3-flash-preview")
[LiteLLM Proxy @ localhost:8000]
     ↓ (translates &amp; routes)
[Your Choice:] 
  ├─→ [Z.ai API @ api.z.ai] → [glm-5 / glm-4.7]
  └─→ [DeepSeek API] → [deepseek-chat/deepseek-reasoner]
</code></pre>
<p>The magic? <em>The CLI never knows it's not talking to Google</em>. It sends a request to <code>gemini-3-flash-preview</code>. <code>LiteLLM</code> intercepts, translates, and forwards to your chosen backend. The response flows back, indistinguishable from a native Gemini reply.</p>
<hr />
<h3>Chapter 4: The Configuration Codex</h3>
<p>Here is the cipher that makes it all work. Save this as <code>proxy_config.yaml</code> in your project's directory:</p>
<pre><code class="language-yaml"># {{USER_HOME}}/project/proxy_config.yaml
# The routing manifest: gemini-cli model names → your actual providers

model_list: 
  - model_name: gemini-3.1-pro-preview
    litellm_params:
      model: openai/glm-5.1
      api_base: "https://api.z.ai/api/coding/paas/v4"
      api_key: "{{YOUR_Z_AI_API_KEY}}"
      
  - model_name: gemini-3-flash-preview
    litellm_params:
      model: openai/glm-5-turbo
      api_base: "https://api.z.ai/api/coding/paas/v4"
      api_key: "{{YOUR_Z_AI_API_KEY}}"

  - model_name: gemini-3.1-flash-lite-preview
    litellm_params:
      model: openai/glm-4.7
      api_base: "https://api.z.ai/api/coding/paas/v4"
      api_key: "{{YOUR_Z_AI_API_KEY}}"
      
  - model_name: gemini-2.5-pro
    litellm_params:
      model: openai/deepseek-reasoner
      api_base: "https://api.deepseek.com"
      api_key: "{{YOUR_DEEPSEEK_API_KEY}}"

  - model_name: gemini-2.5-flash
    litellm_params:
      model: openai/deepseek-chat
      api_base: "https://api.deepseek.com"
      api_key: "{{YOUR_DEEPSEEK_API_KEY}}"

  - model_name: gemini-2.5-flash-lite
    litellm_params:
      model: openai/glm-4.5-air
      api_base: "https://api.z.ai/api/coding/paas/v4"
      api_key: "{{YOUR_Z_AI_API_KEY}}"

general_settings:
  default_model: gemini-3.1-flash-lite-preview
</code></pre>
<p><strong>Critical Notes:</strong></p>
<ul>
<li><p>Replace <code>litellm_params:</code> with your actual API end points, <code>{{YOUR_Z_AI_API_KEY}}</code> and <code>{{YOUR_DEEPSEEK_API_KEY}}</code> with your actual keys (<code>.env</code>)</p>
</li>
<li><p>Switch <code>/auth</code> on <code>gemini-cli</code> to use gemini-api, routing does not work on OAuth</p>
</li>
<li><p>The <code>openai/</code> prefix tells <code>LiteLLM</code> to treat custom endpoints as OpenAI-compatible</p>
</li>
</ul>
<hr />
<h3>Chapter 5: The Authentication Enigma</h3>
<p>Even with perfect routing, the CLI refused to cooperate. The error persisted:</p>
<p><code>[API Error: {"error":{"message":"API key not valid...}}]</code></p>
<p>The culprit? <em>Environment variable inheritance.</em></p>
<p><code>gemini-cli</code> does not read session environment variables <code>($env:VAR)</code> the way you might expect. It prioritizes:</p>
<ol>
<li><p>Persistent user environment variables <code>([System.Environment]::SetEnvironmentVariable(..., 'User'))</code></p>
</li>
<li><p>Configuration files <code>(~/.gemini/settings.json)</code></p>
</li>
<li><p>Session variables <code>(last resort)</code></p>
</li>
</ol>
<p>The solution was a two-part key:</p>
<p><strong>Part A: The Proxy Environment Bridge</strong></p>
<p>Set these <code>.env</code> variables before starting <code>gemini-cli</code> after <code>LiteLLM</code> is running using the <code>yaml</code> file</p>
<pre><code class="language-shell">$env:GOOGLE_API_BASE = "http://localhost:8000"
$env:GOOGLE_API_KEY = "dummy-key"
$env:CURL_CA_BUNDLE = ""
gemini
</code></pre>
<p><strong>Part B: On-demand or Persistent</strong></p>
<pre><code class="language-shell"># Session-only (on-demand)

$env:HTTP_PROXY = "http://localhost:8000"
$env:HTTPS_PROXY = "http://localhost:8000"
$env:GOOGLE_API_KEY = "dummy-key"
$env:CURL_CA_BUNDLE = ""  # Bypass SSL for local proxy

# OR persistent (run once)
[System.Environment]::SetEnvironmentVariable('HTTP_PROXY', 'http://localhost:8000', 'User')
[System.Environment]::SetEnvironmentVariable('HTTPS_PROXY', 'http://localhost:8000', 'User')
[System.Environment]::SetEnvironmentVariable('GOOGLE_API_KEY', 'dummy-key', 'User')
</code></pre>
<hr />
<h3>Chapter 6: The Launch Sequence</h3>
<p>Test your connection:</p>
<pre><code class="language-shell">litellm --config proxy_config.yaml --port 8000

curl http://localhost:8000/v1/chat/completions `
  -H "Content-Type: application/json" `
  -H "Authorization: Bearer dummy-key" `
  -d '{"model":"gemini-2.5-flash-lite","messages":[{"role":"user","content":"TEST: GLM?"}],"max_tokens":20}'
</code></pre>
<p>With configuration and authentication aligned, the final ritual:</p>
<pre><code class="language-shell"># TERMINAL 1: Start the proxy (keep this window open)
cd {{USER_HOME}}/project
litellm --config proxy_config.yaml --port 8000


# TERMINAL 2: Launch gemini-cli with proxy settings
$env:HTTP_PROXY = "http://localhost:8000"
$env:HTTPS_PROXY = "http://localhost:8000"
$env:GOOGLE_API_KEY = "dummy-key"
$env:CURL_CA_BUNDLE = ""
gemini

# Inside gemini-cli:
/model gemini-3-flash-preview   # Routes to your set provider
Hello, this is a test.  # Should receive response via the proxy
</code></pre>
<p>The terminal test will work! but <code>gemini-cli</code> will be stubborn...</p>
<hr />
<h3>Chapter 7: The Final Twist — Patching the SDK Itself</h3>
<p><em>Sometimes, the lock isn't on the door. It's in the key.</em></p>
<p>Despite every configuration tweak, every environment variable, every proxy setting <code>gemini-cli</code> still refused to honor our custom <code>api_base</code>. The requests still flew straight to <code>https://generativelanguage.googleapis.com/</code>, bypassing our carefully constructed <code>LiteLLM</code> router.</p>
<p>The breakthrough came from an unlikely source: <strong>the AI-CLI itself</strong>.</p>
<p>In a moment of recursive brilliance, the developer asked the very model trapped inside the CLI:</p>
<blockquote>
<p>"How can I modify gemini-cli to respect a custom API base URL?"</p>
</blockquote>
<p>The response was not a workaround. It was a surgical strike:</p>
<blockquote>
<p>"The API base is hardcoded in the <code>@google/genai</code> SDK. To override it, patch the compiled JavaScript files to check for <code>GOOGLE_API_BASE</code> environment variable before falling back to the default."</p>
</blockquote>
<p>The enemy had revealed its own source code vulnerabilities.</p>
<p><strong>The Target Files:</strong></p>
<p>Three files, buried deep in the <code>npm</code> global installation, held the hardcoded URL hostage:</p>
<pre><code class="language-plaintext">{{USER_HOME}}/AppData/Roaming/npm/node_modules/@google/gemini-cli/node_modules/@google/genai/dist/
├── index.cjs                    ← CommonJS entry point
├── node/index.cjs              ← Node-specific CommonJS
└── node/index.mjs              ← Node-specific ES Module
</code></pre>
<p><strong>The Patch: A Three-Line Revolution</strong></p>
<p>In each file, locate the section where <code>apiBase</code> is defined. It looks something like:</p>
<pre><code class="language-javascript">// BEFORE (hardcoded)
= "https://generativelanguage.googleapis.com/";
</code></pre>
<p>Replace it with this environment-aware logic:</p>
<pre><code class="language-javascript">// AFTER (environment-aware)
= process.env.GOOGLE_API_BASE 
  || "https://generativelanguage.googleapis.com/";
</code></pre>
<p><strong>What this does:</strong></p>
<ol>
<li><p>Checks for <code>GOOGLE_API_BASE</code> first (our proxy)</p>
</li>
<li><p>Defaults to Google's endpoint if it is not set</p>
</li>
</ol>
<p><strong>The Moment of Truth</strong></p>
<pre><code class="language-shell"># Start the router with the yaml file
litellm --config proxy_config.yaml --port 8000

# Set the environment variable that the SDK now respects
$env:GOOGLE_API_BASE = "http://localhost:8000"
$env:GOOGLE_API_KEY = "dummy-key"
$env:CURL_CA_BUNDLE = ""

# Launch gemini-cli — it now honors our proxy
gemini
</code></pre>
<p>No more <code>HTTP_PROXY</code> workarounds. No more <code>settings.json</code> gymnastics. The CLI itself now natively supports custom endpoints.</p>
<p>The requests are flowing. The routing is active. The capacity walls have fallen.</p>
<p>Now your workflow is antifragile:</p>
<ul>
<li><p>✅ When Gemini is healthy: Use free credits via OAuth</p>
</li>
<li><p>✅ When capacity hits: Switch <code>/auth</code> to Google's API and start <code>LLM proxy</code>: Seamlessly route through to your own providers</p>
</li>
<li><p>✅ Zero downtime: Your pipelines keep running</p>
</li>
</ul>
<hr />
<h3>Epilogue: The Lesson in the Labyrinth</h3>
<p>What began as a capacity error became a masterclass in system design.</p>
<p>The final revelation was not technical—it was <em>philosophical</em>:</p>
<blockquote>
<p>The most elegant solutions do not break systems. They understand them so deeply that they can redirect their flow without altering their nature.</p>
</blockquote>
<p>Google's infrastructure was not the enemy. It was a puzzle. And puzzles, by design, have solutions.</p>
<p>The ultimate twist? <strong>The AI helped patch itself to bypass its own restrictions</strong>. In asking Gemini how to circumvent Gemini's limits, we discovered that the system contained the seeds of its own flexibility—if only someone knew where to look.</p>
<p>For the reader who wishes to replicate this journey:</p>
<ol>
<li><p><strong>Install</strong> <code>LiteLLM</code><strong>:</strong> <code>pip install 'litellm[proxy]'</code></p>
</li>
<li><p><strong>Configure routing:</strong> Use the <code>proxy_config.yaml</code> template above</p>
</li>
<li><p><strong>Patch the SDK:</strong> Replace the hardcoded API endpoint to test if the .env endpoint is set and to use it instead, to enable <code>GOOGLE_API_BASE</code> override</p>
</li>
<li><p><strong>Set environment:</strong> <code>$env:GOOGLE_API_BASE = "http://localhost:8000"</code></p>
</li>
<li><p><strong>Test incrementally:</strong> Verify each route with curl before involving the CLI</p>
</li>
<li><p>Switch <code>/auth</code> to Gemini API, Start the proxy with the <code>.env</code> set once rate limits hit</p>
</li>
</ol>
<p>The code is open. The path is clear. The only remaining variable is your willingness to see constraints not as walls, but as invitations to innovate.</p>
<p><strong>Troubleshooting Checklist:</strong></p>
<ul>
<li><p>Proxy running: <code>litellm --config proxy_config.yaml --port 8000</code></p>
</li>
<li><p>SDK patched: Check files</p>
</li>
<li><p>Env var set: <code>$env:GOOGLE_API_BASE = "http://localhost:8000"</code></p>
</li>
<li><p>Model names match exactly between CLI and config</p>
</li>
<li><p>Test with <code>-curl</code></p>
</li>
</ul>
<p><em>This story is based on real events. All code samples are functional and tested. Replace</em> <code>{{USER_HOME}}</code> <em>with your actual home directory path (e.g.,</em> <code>C:\Users\YourName</code> <em>on Windows or</em> <code>/home/yourname on Linux/macOS</code><em>).</em></p>
<p><em>⚠️ Disclaimer: Patching third-party SDKs may void support agreements and could break with future updates. Use at your own risk. This solution is for educational purposes and personal use only.</em></p>
<p><em>May your requests always find their route.</em> 🗝️✨</p>
]]></content:encoded></item></channel></rss>