What This Covers
- The non-negotiables before you put the Swarms API on the critical path of a production system
- Rate-limit, retry, idempotency, and error-handling defaults that match how the API actually behaves
- A single Python wrapper that bundles retry-with-backoff, a budget cap, and post-run log verification
- The right endpoints to use for offline / batch workloads
- When to upgrade to the premium tier
Why This Matters
The first 80% of any API integration is happy-path code. The last 20% — the part that decides whether you get paged at 3am — is rate-limit handling, retries on transient failures, budget guardrails, and an audit trail. This guide is the terse checklist for that last 20%, written for SREs and senior engineers who already know what exponential backoff is and just want the production-correct defaults for this specific API.The Checklist
Rate Limits
- Read the
X-RateLimit-Remaining-MinuteandX-RateLimit-Remaining-Dayheaders on every response and log them - On
429, honor theRetry-Afterheader verbatim — don’t substitute your own value - Throttle proactively when
Remaining-Minute / Limit-Minute < 0.1rather than waiting for the 429 - If you’re hitting limits regularly, upgrade to premium (2,000/min, 100,000/day) before re-engineering
- Reference: Rate Limit Headers, Rate Limits
Idempotency and Request IDs
- Generate a client-side request ID (UUID v4) for every submit and persist it alongside the payload
- Use the request ID as the join key when you later reconcile against
/v1/swarm/logs - Treat retries as idempotent only if you’ve established the original request never succeeded — a 5xx or a timeout is not proof of non-execution
Retry Policy
- Retry on
5xx, connection errors, and read timeouts. Never retry on4xxother than429 - Use exponential backoff with full jitter:
sleep = random.uniform(0, base * 2**attempt) - Cap at 4–5 attempts. Beyond that, the underlying issue isn’t transient
- On
429, useRetry-Afterinstead of your computed backoff
Structured Error Handling
- Always check
response.status_codeand raise on non-2xx — do not blindlyresponse.json() - Capture the request ID,
X-RateLimit-*headers, status code, response body, and elapsed time on every error - Distinguish “API rejected the request” (4xx) from “API never saw it” (network) — they require different remediation
- Log the failed payload’s
swarm_typeand per-agentmodel_namefor debugging
Cost Monitoring
- Before any batch run, call
/v1/account/creditsand refuse to submit iftotal_credits < estimated_cost * 1.5 - After every run, persist
result["usage"]["billing_info"]["total_cost"]to your warehouse keyed by request ID - Run a daily cron against
/v1/usage/reportand alert on day-over-day cost > 2x the trailing 7-day mean - Set hard budget caps in your wrapper — a runaway loop should fail fast, not bleed credits
- Reference: Production Observability, Usage Report, Get Credit Balance
Batch Endpoints for Offline Workloads
- If you’re submitting more than ~20 requests in a tight loop, switch to the batch endpoints
- Use batch for backfills, evaluation sweeps, nightly report generation — anything where latency-per-row is not the goal
- References: Batch Swarm Completions, Batch Processing
Premium Tier Thresholds
Upgrade to premium when any of these are true. The premium tier is $100/month and gives you 20x the per-minute, 200x the per-hour, and 83x the per-day quota — and 10x the per-agent token budget.- You hit a
429more than once per day in production - You need to run more than 1,200 requests per day
- Any single agent needs
max_tokens > 200,000 - Reference: Premium Endpoints
Audit and Compliance
- Daily export of
/v1/swarm/logsto your own log lake (S3, BigQuery, etc.) - Pin
model_nameper agent. Do not let your code “pick a model” at runtime in regulated workloads - For regulated workloads, set
temperature=0(or omit it on Opus 4.8) andmax_loops=1for reproducibility
The Wrapper
One file. Drop it in. It does retry-with-backoff that honorsRetry-After, a hard budget cap, request-ID tagging, and post-run log verification.
Using the Wrapper
The wrapper tags every request’s
name with the first 8 characters of the UUID so the log-verification step has something to match on. The Swarms API does not (today) accept arbitrary client-side IDs, so name-tagging is the pragmatic correlation strategy.Operational Defaults Cheat Sheet
| Setting | Production default | Why |
|---|---|---|
| Retries on 5xx / timeout | 4, full-jitter exponential backoff, base 1.5s | Catches transient infra without DOS’ing yourself |
| Retries on 429 | Honor Retry-After literally | The server knows when its window resets; you don’t |
| Retries on 4xx (not 429) | Zero | Your request is malformed; retrying won’t fix it |
| Per-call budget | $2–$5 | High enough for most swarms, low enough to catch runaway loops |
| Cumulative budget | Scale to job size, hard cap | Prevents a bad config from emptying your account |
| Credit safety margin | 1.5x estimated cost | Covers retries and minor cost variance |
| Request timeout | 600s | Long swarms run for minutes; 60s is too aggressive |
temperature (regulated) | 0 (or omit on Opus 4.8) | Reproducibility for audit |
max_loops per agent | 1 unless you have a reason | Reduces blast radius of misbehavior |
Common Pitfalls
Retrying on 401 / 422
Retrying on 401 / 422
Don’t. The API returned a deterministic rejection — your auth header is wrong, or your payload schema is wrong. Retries just burn time and rate-limit budget. Fix the request.
Backoff without jitter
Backoff without jitter
If every client retries on the same exponential schedule, you get a thundering herd the moment the rate-limit window resets. Always use full jitter:
sleep = random.uniform(0, base * 2**attempt).No budget cap in code
No budget cap in code
A misconfigured loop or a runaway hierarchical swarm can spend hundreds of dollars in minutes. The per-call and cumulative budget checks in the wrapper above are the cheapest insurance you’ll buy this quarter.
Treating /v1/swarm/logs as real-time
Treating /v1/swarm/logs as real-time
Logs are durable, not instantaneous. The verify-in-logs check above is for after-the-fact audit, not for inline correlation. Don’t block your hot path on it.
Next Steps
- Read Production Observability for the dashboard and audit-trail layer that sits on top of this wrapper
- Read Rate Limit Headers for the exact header schema and tier thresholds
- Read Batch Swarm Completions before sending more than ~20 requests in a tight loop