POST /v1/chat/completions endpoint. If your application already uses the OpenAI SDK, you can switch to Swarms by changing two lines — the base_url and api_key — and everything else works unchanged.
Under the hood, every request is routed through the full Swarms agent infrastructure: model routing, token counting, billing, and logging all apply exactly as they do for the native /v1/agent/completions endpoint.
Endpoint Information
- URL:
/v1/chat/completions - Method:
POST - Authentication: Required (
x-api-keyheader orAuthorization: Bearer <key>) - Rate Limiting: Subject to tier-based rate limits
Authentication
Two authentication methods are supported. Both work on all Swarms API endpoints.| Method | Header | Example |
|---|---|---|
| API key header | x-api-key: <key> | x-api-key: sk-abc123 |
| Bearer token | Authorization: Bearer <key> | Authorization: Bearer sk-abc123 |
Get your API key at swarms.world/platform/api-keys.
Request Schema
ChatCompletionRequest Object
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model to use for completion (e.g. gpt-4o, claude-sonnet-4-20250514, gpt-4o-mini). Any model supported by the Swarms API is accepted |
messages | List[ChatMessage] | Yes | — | A list of messages comprising the conversation (see ChatMessage Object) |
temperature | float | No | 0.5 | Sampling temperature (0.0 – 2.0). Lower values produce more deterministic output |
max_tokens | integer | No | 8192 | Maximum number of tokens to generate in the response |
max_completion_tokens | integer | No | — | Alternative to max_tokens. Takes precedence if both are set |
stream | boolean | No | false | If true, returns Server-Sent Events (SSE) in the OpenAI chunk format |
top_p | float | No | — | Nucleus sampling parameter. An alternative to temperature sampling |
presence_penalty | float | No | — | Penalize tokens based on whether they have appeared in the text so far |
frequency_penalty | float | No | — | Penalize tokens based on how frequently they appear in the text so far |
n | integer | No | 1 | Number of completions to generate. Only 1 is supported — requests with n > 1 are rejected |
user | string | No | — | A unique identifier for the end-user, used for tracking |
ChatMessage Object
Each message in themessages array:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of system, user, or assistant |
content | string or List[ContentPart] | Yes | Text content, or an array of content parts for multimodal input |
name | string | No | An optional name for the participant |
ContentPart (Multimodal)
Whencontent is an array, each element is a content part:
Text part:
url field accepts both HTTPS URLs and base64-encoded data URIs (data:image/png;base64,...).
Validation Rules
- At least one message with
role: "user"is required nmust be1— multiple completions per request are not supported (send separate requests instead)- Requests with zero messages or only system messages are rejected
Example Request Body
Response Schema
ChatCompletionResponse Object (Non-Streaming)
| Field | Type | Description |
|---|---|---|
id | string | Unique completion identifier, prefixed with chatcmpl- |
object | string | Always "chat.completion" |
created | integer | Unix timestamp of when the completion was generated |
model | string | The model that was used (echoes back the requested model name) |
choices | List[Choice] | Array containing the completion result (always one element) |
usage | CompletionUsage | Token usage counts for billing |
Choice Object
| Field | Type | Description |
|---|---|---|
index | integer | Always 0 (single-choice responses) |
message | ChatMessage | The assistant’s response with role: "assistant" |
finish_reason | string | Why the model stopped generating — "stop" for normal completion |
CompletionUsage Object
| Field | Type | Description |
|---|---|---|
prompt_tokens | integer | Number of tokens in the input (system prompt + history + task) |
completion_tokens | integer | Number of tokens in the generated response |
total_tokens | integer | Sum of prompt_tokens and completion_tokens |
Example Response
Streaming Response Schema
Whenstream: true is set, the response is returned as Server-Sent Events (SSE). Each event is a data: line containing a JSON chunk.
StreamChunk Object
| Field | Type | Description |
|---|---|---|
id | string | Same chatcmpl- ID shared across all chunks in the stream |
object | string | Always "chat.completion.chunk" |
created | integer | Unix timestamp (same across all chunks) |
model | string | The requested model name |
choices | List[StreamChoice] | Array with one element containing the delta |
StreamChoice Object
| Field | Type | Description |
|---|---|---|
index | integer | Always 0 |
delta | object | Incremental content — see stream sequence below |
finish_reason | string or null | null during streaming, "stop" on the final chunk |
Stream Sequence
| Order | delta | finish_reason | Purpose |
|---|---|---|---|
| First chunk | {"role": "assistant"} | null | Role declaration |
| Content chunks | {"content": "..."} | null | Incremental text content |
| Final chunk | {} | "stop" | Signals completion |
| Terminator | data: [DONE] | — | SSE stream end marker |
Example Stream
Error Response Schema
Errors are returned in the standard OpenAI error format so the OpenAI SDK’s built-in error classes work correctly:Error Object
| Field | Type | Description |
|---|---|---|
error.message | string | Human-readable error description |
error.type | string | Error category (see table below) |
error.code | string or null | Machine-readable error code |
error.param | string or null | The parameter that caused the error |
Error Types
| HTTP Status | type | When |
|---|---|---|
| 400 | invalid_request_error | Malformed request, validation failure, missing required fields |
| 401 | authentication_error | Missing or invalid API key |
| 403 | permission_error | Insufficient permissions or subscription tier |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | server_error | Internal error during agent execution |
Example Error Response
Code Examples
Non-Streaming Completion
- Python
- TypeScript
- Go
- Rust
- cURL
Streaming Completion
- Python
- TypeScript
- Go
- Rust
- cURL
Multi-Turn Conversation
- Python
- TypeScript
- Go
- Rust
Error Handling
- Python
- TypeScript
- Go
- Rust
How It Maps to Swarms Internals
For users already familiar with the native Swarms API, here is how the OpenAI request fields map toAgentCompletion and AgentSpec:
| OpenAI Field | Swarms Equivalent | Notes |
|---|---|---|
model | AgentSpec.model_name | Passed through as-is |
messages (system) | AgentSpec.system_prompt | Defaults to “You are a helpful assistant.” if absent |
messages (last user) | AgentCompletion.task | The actual prompt the agent runs on |
messages (prior turns) | AgentCompletion.history | User and assistant messages before the final user message |
messages (image_url parts) | AgentCompletion.img / imgs | Extracted from multimodal content parts |
temperature | AgentSpec.temperature | Defaults to 0.5 |
max_tokens / max_completion_tokens | AgentSpec.max_tokens | max_completion_tokens takes precedence; defaults to 8192 |
top_p | AgentSpec.llm_args.top_p | Passed through to the underlying LLM |
presence_penalty | AgentSpec.llm_args.presence_penalty | Passed through to the underlying LLM |
frequency_penalty | AgentSpec.llm_args.frequency_penalty | Passed through to the underlying LLM |
stream | Route dispatch | true returns StreamingResponse with SSE; false returns JSON |
max_loops=1 (single turn, no autonomous looping) and streaming_on=False (the agent itself runs to completion; streaming is simulated at the HTTP layer by chunking the result).
Supported Models
Themodel field accepts any model supported by the Swarms API. Common options:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini |
| Anthropic | claude-sonnet-4-20250514, claude-3-7-sonnet-latest |
| Groq | groq/llama3-70b-8192, groq/deepseek-r1-distill-llama-70b |
GET /v1/models/available with your API key.
Differences from the OpenAI API
| Behavior | OpenAI API | Swarms API |
|---|---|---|
n > 1 | Returns multiple choices | Rejected with error — send separate requests |
| Tool calling / function calling | Supported | Not supported on this endpoint. Use /v1/agent/completions with tools_list_dictionary |
logprobs | Supported | Not supported |
Response format (json_object) | Supported | Not supported on this endpoint. Use /v1/agent/completions with structured output |
| Streaming | True token-by-token streaming | Simulated — the agent runs to completion, then the result is delivered in chunks |
Billing
Usage is metered and billed identically to the native/v1/agent/completions endpoint:
- Input tokens are counted from the combined system prompt, conversation history, and task
- Output tokens are counted from the agent’s response
- Credits are deducted automatically after each completion
- The
usagefield in the response shows the exact token counts
GET /v1/users/me/credits.