⟵ back

OpenRouter Is My Primary Subscription Now

I cancelled my frontier-provider subscriptions. Not because the models got worse — they didn't — but because paying a fixed monthly fee to one lab, then a second fee to another lab, then living inside whichever chat app happened to wrap the model I wanted, stopped making sense. I route everything through OpenRouter now. One account, one key, one balance, and a model picker that spans every provider worth using.[1]

The catch is that the moment you leave the first-party apps, you lose everything that wasn't the raw model: web search, vision, image generation, document reading, voice. The chat box stops feeling like a product and starts feeling like a completions endpoint. So I built the missing layer as a Pi extension: pi-openrouter-multimodal. Eight tools that give a coding agent the same multimodal surface the consumer apps have, all proxied through the one OpenRouter key I already pay into.[2]

Why I left the frontier subscriptions

The pitch for a frontier subscription is simple: pay $20–$200 a month, get the best model and the polished app around it. It's a good deal right up until you use more than one lab. Then you're paying two or three subscriptions, none of which talk to each other, each of which silently rate-limits you on the model you actually wanted, and each of which decides for you which model a given feature is allowed to call.

Three things pushed me over the edge:

OpenRouter is the gateway. It normalizes hundreds of models from dozens of providers behind a single OpenAI-compatible API, handles failover when a provider is down, and bills it all to one balance. The model name is a string you change. That's the whole pitch, and it's enough.[1]

The app tax

Here's what nobody tells you when you cancel: the subscription was never just the model. It was the model plus a pile of server-side machinery the app quietly called on your behalf. Ask the chat app a question about a current event and it ran a web search. Paste a screenshot and it ran a vision model. Ask for a picture and it called an image model. Drop in a PDF and something parsed it. Talk to it and a speech-to-text model transcribed you.

Strip that away and a raw completions endpoint can't do any of it. The model has no eyes, no browser, no canvas, no ears. For most of my work — coding inside an agent — that's a real regression. I want my agent to read the docs page I link, look at the failing screenshot, and read the PDF spec without me leaving the terminal. That capability gap is the app tax, and it's the actual reason people feel trapped in the first-party apps.

The fix isn't to go back. It's to rebuild the capabilities as agent tools, pointed at the same gateway. OpenRouter already exposes the pieces — server-side web search, vision-capable chat completions, image output, audio endpoints. They just need to be wired into the agent as callable tools.

pi-openrouter-multimodal

pi-openrouter-multimodal is a Pi extension that adds eight tools to the agent, every one of them proxied through OpenRouter. One install, one key, and the agent gets its multimodal surface back:

pi install npm:@dtmirizzi/pi-openrouter-multimodal
Tool What it does
web_search Server-side web search with real-time results
web_fetch Fetch page content from a URL — web pages, docs, PDFs
image_generate Text-to-image generation via chat completions
image_understand Analyze images with a vision model
video_understand Analyze video (YouTube links work with Gemini)
pdf_read Extract and analyze PDF content
tts_speak Text-to-speech via the /audio/speech endpoint
stt_transcribe Speech-to-text via the /audio/transcriptions endpoint

Every tool is independently toggleable. If you only want web search and PDF reading, turn the other six off and they never enter the agent's tool list — which keeps the context lean and the agent from reaching for capabilities you didn't sanction. This is the same least-privilege instinct from the governance work: the tools that aren't present can't be misused.

One key, every modality Pi agent + extension OpenRouter one key, one balance web search / fetch image generate image / video vision pdf read tts / stt any chat model The app tax, rebuilt as agent tools — billed to the balance you already have.

How it actually works

There's no second account and no second SDK. Every tool resolves the OpenRouter key in priority order, so if your agent already talks to OpenRouter, the extension is configured the moment it installs:

  1. the OPENROUTER_API_KEY environment variable
  2. Pi's model registry, under the openrouter provider
  3. ~/.pi/agent/models.json, at providers.openrouter.apiKey

Under the hood, the tools are thin adapters over endpoints OpenRouter already exposes. Search and fetch use server-side tool execution. Image generation rides on chat completions with image output. Vision and video analysis pass media as content blocks to a vision-capable model. PDFs go through a file-parser. TTS and STT hit the dedicated /audio/speech and /audio/transcriptions endpoints. The agent sees eight clean tools; the gateway does the routing.

Picking a model per modality

The part I care most about: you choose the model for each modality independently. The extension queries OpenRouter for the live model list at startup, so the picker reflects what's actually available right now, not a hardcoded menu that rots. Two interactive overlays drive it:

So I can point image generation at Gemini Flash Image one week and FLUX.2 the next, run vision on whatever the strongest current model is, and parse PDFs with a free Cloudflare model or Mistral OCR depending on the document. Search and fetch can route through the native engine, Exa, Firecrawl, or parallel. None of those choices touch which model writes my code — they're orthogonal, exactly as they should be.

These settings persist. They survive session shutdown, context compaction, and tree navigation, so I configure once and the agent keeps the setup across every future session.[2]

What this buys me

The whole stack is now one mental model: a single balance, a single key, and a model name I treat as a dial. The coding model, the vision model, and the image model can each be the best available option from whichever lab shipped it, with no extra subscription and no second app. When a new model lands on OpenRouter, it shows up in my picker the same day — no waiting for an app to wire it into a feature.

And it composes with everything else I run on Pi. The governance extension still intercepts every one of these tool calls: web_fetch hitting a sketchy URL, image_generate burning budget, an audio transcription leaking PII on output. Multimodal capability and governance are separate extensions over the same harness, which is the point — capabilities are tools, control is the layer underneath them.

The honest tradeoffs

This isn't free of friction. Going metered means you watch a balance instead of forgetting a flat fee; a runaway agent loop can spend real money, which is one more reason to run it behind budgets and governance. You give up the consumer-app polish — the streaming UI, the saved chats, the mobile client — in exchange for control. And a gateway adds a hop: you're trusting OpenRouter's routing and uptime, though in practice their failover has been more reliable than any single provider I'd otherwise depend on.

For agent-driven, multi-model work, those are easy trades. I'd rather own the harness and rent the tokens than rent the whole stack and own none of it.

Try it

If you're on Pi and you route through OpenRouter, it's one command and you keep your existing key:

pi install npm:@dtmirizzi/pi-openrouter-multimodal
# then, inside Pi:
/web-tools     # toggle the eight tools
/web-models    # pick a model per modality

The source, the full tool reference, and the configuration options are on GitHub. Issues and PRs welcome — I'm running this as my daily driver, so I'll be shipping fixes either way.

References

  1. OpenRouter — a unified API and credit balance across hundreds of models from dozens of providers. openrouter.ai
  2. pi-openrouter-multimodal — eight OpenRouter-backed multimodal tools for Pi coding agents (source, tool reference, and configuration). github.com/dtmirizzi/pi-openrouter-multimodal