The AI Gateway - One Interface, Multiple Models

The AI Gateway: One Interface, Multiple Models

I've been thinking a lot lately about how AI models are becoming commoditized. Not in a bad way—quite the opposite. We're living in an era where you can choose from Claude, GPT-4, Llama, Mistral, and a dozen others depending on your needs, budget, and values.

But here's the problem: most applications lock you into one.

Pick OpenAI and you're committed. Build on Anthropic and you can't easily switch. Want to run local models for privacy? Good luck integrating that into your existing workflow. This fragmentation is frustrating because what you really want is simple: use the best tool for the job without rewriting your entire application.

That's why I'm building an AI gateway into Jottings.

Why This Matters

When I was designing Jottings, I kept coming back to the same question: what if we let you choose your AI provider?

Think about it. Some of you might prefer Claude for long-form content and analysis. Others swear by GPT-4's creativity. A few might want to run Llama locally for sensitive content. Your neighbor might use Mistral because it hits the perfect balance between cost and capability.

Right now, if you want to power your microblog with AI—whether that's generating summaries, creating titles, or brainstorming content—you're stuck with whoever the platform chose for you.

I didn't want that for Jottings users.

The Technical Vision

The way I'm approaching this is inspired by how payment gateways work. Stripe doesn't just support one bank. Adyen doesn't lock you into one processor. They abstract away the complexity and let you use whatever payment method makes sense.

An AI gateway works the same way.

At its core, it's:

  • One unified API that your application calls
  • Model routing logic that knows how to talk to different providers
  • Consistent response formatting regardless of backend
  • Fallback handling so a service interruption doesn't break your site
  • Cost tracking so you know what you're actually paying

Technically, this might mean:

  • Workers AI handles lighter tasks and stays within Cloudflare's ecosystem (fast, zero-latency)
  • OpenAI comes in for complex reasoning and fine-tuned models
  • Local models for privacy-sensitive work
  • Mixture of Experts patterns where you route based on token count, latency requirements, or cost targets

What This Looks Like in Practice

Imagine you're writing a jot and you want AI-generated suggestions. Here's how the gateway thinks about it:

User requests title generation
↓
Gateway evaluates:
  - Cost budget? (use cheaper model)
  - Speed required? (use fastest endpoint)
  - Quality needed? (use most capable)
  - Any provider locked to? (respect preference)
↓
Routes to appropriate provider
↓
Returns normalized response
↓
Falls back to alternate if first fails

No breaking changes for you. Same interface whether I'm talking to Claude or Llama. Same error handling. Same response structure.

The Freedom Angle

Here's what excites me most: you're not trapped.

If OpenAI raises prices 50%, you might switch to Anthropic for certain tasks. If a new model comes out that's radically better at your use case, we add support and you flip a toggle. Want to run your own Ollama instance? The gateway can talk to that too.

This is genuinely different from how most SaaS products work. Most platforms say "this is how we've built it, deal with it." I'm saying "we've built the plumbing so you can choose the water source."

The Honest Challenges

I'd be dishonest if I pretended this was simple to build.

Different models have different latency profiles. OpenAI might respond in 2 seconds while a local model takes 8. Different models cost wildly different amounts—GPT-4 is orders of magnitude more expensive than open-source alternatives. Error handling gets complex when you're routing between providers with different failure modes.

And then there's the question of consistency: does a title generated by Claude read the same as one from GPT-4? Probably not. Is that good or bad? Depends on your taste.

But these are good problems to have because they reflect real choice, not artificial constraints.

What's Coming

Right now, I'm in the architecture phase. I'm designing the routing logic, the model selection UI, and the cost tracking dashboard. I'm also talking to users about what matters most—speed? Cost? Quality? Privacy?

The early implementation will probably focus on:

  1. Workers AI as default (fast, integrated with our infrastructure)
  2. OpenAI as fallback (proven, capable, well-understood)
  3. Cost dashboard so you see exactly what you're spending
  4. Simple model selector in your site settings
  5. Graceful degradation if a service goes down

Down the road, I want to add local model support, streaming for long responses, and fine-tuning workflows.

Why I'm Building This

Honestly? Because I'm tired of software that makes decisions for me when I'm perfectly capable of making them myself.

I want Jottings to be the kind of platform where your AI integration reflects your values, not mine. If you care about privacy, you pick local models. If you want the absolute best output, you pick Claude or GPT-4. If you want to minimize costs, you pick an open-source option.

The gateway is just the plumbing that makes that possible.


If you're interested in how I'm building this, or if you have thoughts on which models you'd like to see supported, I'd love to hear from you. Head over to Jottings and start microblogging—when the AI gateway launches, you'll be ready to take full advantage of it.

Choose your interface. Choose your model. Choose your AI story.