What This Covers
- The two-line migration: pointing your existing
openaiPython client at Swarms with no other code changes - Why bearer-token auth works the same as
x-api-key— your existing OpenAI auth pattern transfers - The optional upgrade path: when to graduate from
/v1/chat/completions(single model) to/v1/swarm/completions(multi-agent swarm) - Realistic before/after architecture for an existing OpenAI-backed app
- What you don’t have to throw away — streaming, history, vision, retries
Why This Matters
Most teams already have anopenai client wired into production. Logging, retries, observability, prompt versioning, evaluation — all of it sits behind that one SDK call. The switching cost of “rewrite to a new SDK” is what keeps teams locked into single-model architectures even when their problems demand a swarm. The Swarms API removes that cost: it speaks the OpenAI ChatCompletion protocol on /v1/chat/completions and accepts either x-api-key or Authorization: Bearer <key> — meaning your existing OpenAI SDK code becomes a Swarms client by changing two strings. From there, you can graduate one endpoint at a time to multi-agent without rewriting the layers around it.
This guide is the migration narrative, not a feature tour. For the feature tour, see OpenAI-Compatible Chat Completions.
The Two-Line Migration
Here is the code you already have:- Before — vanilla OpenAI
- After — pointing at Swarms
Authentication: Bearer Token Works Too
The OpenAI SDK injectsAuthorization: Bearer <key> automatically. Swarms accepts both that header and the x-api-key header that the rest of the Swarms docs reference — they’re equivalent, and the platform falls back from one to the other server-side. This means:
- Your OpenAI SDK code keeps using
Authorization: Bearer(transparently — you don’t see it) - Your
requests-style code can keep usingx-api-key(matches every other guide) - Mixed codebases work without translating between auth schemes
x-api-key. Either is fine.
What You Keep For Free
The Swarms/v1/chat/completions endpoint is a faithful OpenAI ChatCompletion. The following continue to work, unmodified, after the base-url swap:
- Streaming (
stream=True— token-by-token deltas) - System / user / assistant messages, including full multi-turn history
- Vision / image input via the multimodal
contentarray - Standard error classes from the SDK (
openai.RateLimitError,openai.APIStatusError, etc.) usageaccounting in the response — prompt/completion/total token countsmax_tokens,temperature,top_p,presence_penalty,frequency_penaltyall forwarded
The First Free Upgrade: max_loops
The Swarms endpoint adds one non-OpenAI parameter you can pass through extra_body: max_loops. It tells the agent to iterate on its own output — think “self-review and refine” without you writing the orchestration. This is the first piece of multi-agent thinking you can adopt without leaving the chat.completions shape.
max_loops defaults to 1 (single pass — identical to OpenAI behavior). Bumping it to 2 or 3 is often the cheapest quality win on hard reasoning tasks. For the full reference see OpenAI-Compatible Chat Completions.
The Strategic Upgrade: /v1/swarm/completions
The chat.completions shape is a single conversation with a single agent. That’s the right tool for many calls, but the reason teams come to Swarms is that some calls need to be a coordinated team of specialised agents — a researcher, an analyst, a critic, a synthesizer. The OpenAI protocol doesn’t have a vocabulary for that. Swarms does, on its native /v1/swarm/completions endpoint.
The decision is per-endpoint: keep using chat.completions for chat-shaped calls, and route the calls that need orchestration to swarm/completions. You don’t migrate everything at once.
When to upgrade an endpoint
Upgrade to/v1/swarm/completions when any of these are true:
- The prompt has more than one role baked into it (e.g. “first research X, then critique it, then summarise”)
- You’re already doing your own agent orchestration in application code (function-calling loops, sub-agent dispatch)
- The output quality is bottlenecked on a single model trying to do too much in one pass
- You need parallel work — concurrent research across multiple angles — combined into one answer
Side-by-side: the same job, two endpoints
- OpenAI-compatible (single agent)
- Swarm-native (multi-agent)
A Migration Plan You Can Actually Run
A pragmatic week-one migration for a team currently on OpenAI:- Day 1. Add a
SWARMS_API_KEYto your secrets. Duplicate your OpenAI client construction into aswarms_clientthat differs only inapi_keyandbase_url. Ship to staging behind a feature flag. - Day 2. Move your least-critical chat.completions endpoint to
swarms_client. Confirm logs, retries, token counting, and streaming all behave. Diff the response on a few hundred prompts against the OpenAI baseline. - Day 3. Identify the one endpoint in your product that does the most prompt engineering — the long, multi-section system prompt with “first do X, then Y, then Z”. This is the one that wants to be a swarm. Cut it over to
/v1/swarm/completionswith 3–5 agents. - Day 4. Add the Cost Optimization Playbook patterns to that swarm — tier the workers, compress the handoffs.
- Day 5. Schedule the batch / overnight pieces of your pipeline against the Night-Mode discount. Audit
discount_activeto confirm.
What You Don’t Have To Throw Away
- Your OpenAI SDK and any wrappers around it
- Your retry / circuit-breaker middleware
- Your token-counting and cost-tracking telemetry
- Your prompt-versioning system
- Your evaluation harness (the response shape is identical)
- Your streaming UI code
- Your
Authorization: Bearerauth flow
Next Steps
- OpenAI-Compatible Chat Completions — full feature reference for the drop-in endpoint (streaming, vision, multi-turn, multi-loop) with TypeScript, Go, and Rust examples
- Cost Optimization Playbook — once you’re on
/v1/swarm/completions, this is the architecture that gets your bill down - Night-Mode Pricing Strategy — 50% off tokens 8 PM – 6 AM Pacific, schedule your batches accordingly