Skip to main content
The Swarms API exposes an OpenAI-compatible POST /v1/chat/completions endpoint. If your application already uses the OpenAI SDK, you can switch to Swarms by changing two lines — the base_url and api_key — and everything else works unchanged. Under the hood, every request is routed through the full Swarms agent infrastructure: model routing, token counting, billing, and logging all apply exactly as they do for the native /v1/agent/completions endpoint.

Endpoint Information

  • URL: /v1/chat/completions
  • Method: POST
  • Authentication: Required (x-api-key header or Authorization: Bearer <key>)
  • Rate Limiting: Subject to tier-based rate limits

Authentication

Two authentication methods are supported. Both work on all Swarms API endpoints.
MethodHeaderExample
API key headerx-api-key: <key>x-api-key: sk-abc123
Bearer tokenAuthorization: Bearer <key>Authorization: Bearer sk-abc123
The Bearer token method is what the OpenAI SDK sends by default, so it works out of the box.
Get your API key at swarms.world/platform/api-keys.

Request Schema

ChatCompletionRequest Object

ParameterTypeRequiredDefaultDescription
modelstringYesModel to use for completion (e.g. gpt-4o, claude-sonnet-4-20250514, gpt-4o-mini). Any model supported by the Swarms API is accepted
messagesList[ChatMessage]YesA list of messages comprising the conversation (see ChatMessage Object)
temperaturefloatNo0.5Sampling temperature (0.0 – 2.0). Lower values produce more deterministic output
max_tokensintegerNo8192Maximum number of tokens to generate in the response
max_completion_tokensintegerNoAlternative to max_tokens. Takes precedence if both are set
streambooleanNofalseIf true, returns Server-Sent Events (SSE) in the OpenAI chunk format
top_pfloatNoNucleus sampling parameter. An alternative to temperature sampling
presence_penaltyfloatNoPenalize tokens based on whether they have appeared in the text so far
frequency_penaltyfloatNoPenalize tokens based on how frequently they appear in the text so far
nintegerNo1Number of completions to generate. Only 1 is supported — requests with n > 1 are rejected
userstringNoA unique identifier for the end-user, used for tracking

ChatMessage Object

Each message in the messages array:
FieldTypeRequiredDescription
rolestringYesOne of system, user, or assistant
contentstring or List[ContentPart]YesText content, or an array of content parts for multimodal input
namestringNoAn optional name for the participant

ContentPart (Multimodal)

When content is an array, each element is a content part: Text part:
{"type": "text", "text": "Describe this image."}
Image part:
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
The url field accepts both HTTPS URLs and base64-encoded data URIs (data:image/png;base64,...).

Validation Rules

  • At least one message with role: "user" is required
  • n must be 1 — multiple completions per request are not supported (send separate requests instead)
  • Requests with zero messages or only system messages are rejected

Example Request Body

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.5,
  "max_tokens": 1024,
  "stream": false
}

Response Schema

ChatCompletionResponse Object (Non-Streaming)

FieldTypeDescription
idstringUnique completion identifier, prefixed with chatcmpl-
objectstringAlways "chat.completion"
createdintegerUnix timestamp of when the completion was generated
modelstringThe model that was used (echoes back the requested model name)
choicesList[Choice]Array containing the completion result (always one element)
usageCompletionUsageToken usage counts for billing

Choice Object

FieldTypeDescription
indexintegerAlways 0 (single-choice responses)
messageChatMessageThe assistant’s response with role: "assistant"
finish_reasonstringWhy the model stopped generating — "stop" for normal completion

CompletionUsage Object

FieldTypeDescription
prompt_tokensintegerNumber of tokens in the input (system prompt + history + task)
completion_tokensintegerNumber of tokens in the generated response
total_tokensintegerSum of prompt_tokens and completion_tokens

Example Response

{
  "id": "chatcmpl-a1b2c3d4e5f6789012345678901",
  "object": "chat.completion",
  "created": 1711300000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Streaming Response Schema

When stream: true is set, the response is returned as Server-Sent Events (SSE). Each event is a data: line containing a JSON chunk.

StreamChunk Object

FieldTypeDescription
idstringSame chatcmpl- ID shared across all chunks in the stream
objectstringAlways "chat.completion.chunk"
createdintegerUnix timestamp (same across all chunks)
modelstringThe requested model name
choicesList[StreamChoice]Array with one element containing the delta

StreamChoice Object

FieldTypeDescription
indexintegerAlways 0
deltaobjectIncremental content — see stream sequence below
finish_reasonstring or nullnull during streaming, "stop" on the final chunk

Stream Sequence

Orderdeltafinish_reasonPurpose
First chunk{"role": "assistant"}nullRole declaration
Content chunks{"content": "..."}nullIncremental text content
Final chunk{}"stop"Signals completion
Terminatordata: [DONE]SSE stream end marker

Example Stream

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Error Response Schema

Errors are returned in the standard OpenAI error format so the OpenAI SDK’s built-in error classes work correctly:

Error Object

FieldTypeDescription
error.messagestringHuman-readable error description
error.typestringError category (see table below)
error.codestring or nullMachine-readable error code
error.paramstring or nullThe parameter that caused the error

Error Types

HTTP StatustypeWhen
400invalid_request_errorMalformed request, validation failure, missing required fields
401authentication_errorMissing or invalid API key
403permission_errorInsufficient permissions or subscription tier
429rate_limit_errorRate limit exceeded
500server_errorInternal error during agent execution

Example Error Response

{
  "error": {
    "message": "At least one message with role 'user' is required.",
    "type": "invalid_request_error",
    "code": "invalid_request",
    "param": null
  }
}

Code Examples

Non-Streaming Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the key trends in renewable energy?"},
    ],
    max_tokens=1024,
    temperature=0.5,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about AI agents."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print()

Multi-Turn Conversation

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a math tutor."},
        {"role": "user", "content": "What is the derivative of x^2?"},
        {"role": "assistant", "content": "The derivative of x^2 is 2x."},
        {"role": "user", "content": "What about x^3?"},
    ],
)

print(response.choices[0].message.content)

Error Handling

from openai import (
    OpenAI,
    APIError,
    AuthenticationError,
    BadRequestError,
    PermissionDeniedError,
    RateLimitError,
)

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(response.choices[0].message.content)
except AuthenticationError:
    print("Missing API key (401)")
except PermissionDeniedError:
    print("Invalid API key or insufficient permissions (403)")
except BadRequestError as e:
    print(f"Validation error (400): {e.message}")
except RateLimitError:
    print("Rate limited — back off and retry")
except APIError as e:
    print(f"API error ({e.status_code}): {e.message}")

How It Maps to Swarms Internals

For users already familiar with the native Swarms API, here is how the OpenAI request fields map to AgentCompletion and AgentSpec:
OpenAI FieldSwarms EquivalentNotes
modelAgentSpec.model_namePassed through as-is
messages (system)AgentSpec.system_promptDefaults to “You are a helpful assistant.” if absent
messages (last user)AgentCompletion.taskThe actual prompt the agent runs on
messages (prior turns)AgentCompletion.historyUser and assistant messages before the final user message
messages (image_url parts)AgentCompletion.img / imgsExtracted from multimodal content parts
temperatureAgentSpec.temperatureDefaults to 0.5
max_tokens / max_completion_tokensAgentSpec.max_tokensmax_completion_tokens takes precedence; defaults to 8192
top_pAgentSpec.llm_args.top_pPassed through to the underlying LLM
presence_penaltyAgentSpec.llm_args.presence_penaltyPassed through to the underlying LLM
frequency_penaltyAgentSpec.llm_args.frequency_penaltyPassed through to the underlying LLM
streamRoute dispatchtrue returns StreamingResponse with SSE; false returns JSON
The agent is created with max_loops=1 (single turn, no autonomous looping) and streaming_on=False (the agent itself runs to completion; streaming is simulated at the HTTP layer by chunking the result).

Supported Models

The model field accepts any model supported by the Swarms API. Common options:
ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini
Anthropicclaude-sonnet-4-20250514, claude-3-7-sonnet-latest
Groqgroq/llama3-70b-8192, groq/deepseek-r1-distill-llama-70b
For the full list, call GET /v1/models/available with your API key.

Differences from the OpenAI API

BehaviorOpenAI APISwarms API
n > 1Returns multiple choicesRejected with error — send separate requests
Tool calling / function callingSupportedNot supported on this endpoint. Use /v1/agent/completions with tools_list_dictionary
logprobsSupportedNot supported
Response format (json_object)SupportedNot supported on this endpoint. Use /v1/agent/completions with structured output
StreamingTrue token-by-token streamingSimulated — the agent runs to completion, then the result is delivered in chunks

Billing

Usage is metered and billed identically to the native /v1/agent/completions endpoint:
  • Input tokens are counted from the combined system prompt, conversation history, and task
  • Output tokens are counted from the agent’s response
  • Credits are deducted automatically after each completion
  • The usage field in the response shows the exact token counts
Check your balance anytime with GET /v1/users/me/credits.