What This Example Shows
- How to enable swarm-level streaming with
stream: trueon/v1/swarm/completions - The full SSE event taxonomy for swarms:
metadata,agent_start,chunk,agent_end,usage,end - How to parse the SSE stream into typed events with agent attribution on every chunk
- How to detect parallel-phase interleaving in
AgentRearrange(when two agents stream concurrently) - Why this differs from single-agent streaming on
/v1/agent/completions
Single-agent streaming (covered in Streaming Responses) streams one model’s tokens. Swarm streaming is harder: tokens come from multiple agents, sometimes overlapping in time. Every chunk carries an
agent field so you know which worker produced it.Why This Matters
Multi-agent swarms are powerful but feel slow without streaming — users stare at a spinner while three agents take turns thinking. Per-token swarm streaming fixes that: the moment any agent in the pipeline starts producing output, you can render it. ForSequentialWorkflow this gives a “live whiteboard” of one agent finishing before the next picks up. For AgentRearrange with a parallel phase (A, B -> C), it gives the truly novel experience of watching two agents type simultaneously into the same UI, then a third synthesize their output the moment they finish. This unlocks chat-style UX for swarm products instead of the batch-job UX you get from non-streaming calls.
Step 1: Setup
Step 2: Understand the SSE Event Types
Whenstream=true is set on a SequentialWorkflow or AgentRearrange swarm, the API responds with a Server-Sent Events stream. Each event has an event: line and a data: JSON payload.
| Event | Emitted | Payload (key fields) |
|---|---|---|
metadata | Once, at the start | job_id, swarm_name, swarm_type, number_of_agents |
start | Once, before agents run | Swarm-level status |
agent_start | Once per agent, when it begins | agent (name) |
chunk | Many per agent — one per token (or small group) | agent, content (the token text), output (running aggregate) |
agent_end | Once per agent, when it finishes | agent, final per-agent content |
usage | Once, near the end | input_tokens, output_tokens, total_tokens, billing_info |
end | Once, terminal | execution_time, final aggregated output |
The defining feature of swarm streaming is that every
chunk event carries an agent field. That’s how you attribute a token to the correct worker when multiple agents are running. Single-agent streaming on /v1/agent/completions does not need this — there is only one agent.Step 3: A Reusable SSE Parser
This helper turns the raw stream into a list of{event, data, received_at} dicts. The received_at timestamp is what lets you detect interleaving in the parallel case. (This mirrors the parser in the test suite — see tests/test_sequential_streaming.py and tests/test_rearrange_streaming.py.)
Step 4: Stream a SequentialWorkflow
In a sequential swarm, agents run one after another. Tokens from agent A all arrive before any tokens from agent B. You’ll see this clearly in the chunk stream — the agent field on each chunk stays constant for a long run, then flips to the next agent and stays there.
SequentialWorkflow.
Step 5: Stream AgentRearrange and Detect Parallel Interleaving
AgentRearrange lets you express a flow with parallel branches using the syntax "A, B -> C" — A and B run concurrently, then C runs after both finish. With streaming on, you’ll see A’s and B’s tokens interleaved in the chunk stream. Detecting that interleaving is how you confirm parallel execution.
flips count is your signal that the parallel branch is actually parallel. After both finish, the Summary agent’s chunks will follow in a single contiguous run.
This is the same detection logic used in the test suite. See
tests/test_rearrange_streaming.py for the canonical reference — it asserts flips >= 3 to verify true parallel streaming.Step 6: Rendering Chunks Per-Agent in a UI
In a real UI you want each agent’s tokens to land in its own panel even when they interleave. The pattern is to bucket chunks byagent and append:
Which swarm types support `stream=true`?
Which swarm types support `stream=true`?
Per-token streaming is currently supported for
SequentialWorkflow and AgentRearrange. Other swarm types either don’t expose per-token output or stream at the agent level only — check /v1/swarms/available for the current list.Why isn't my stream actually streaming?
Why isn't my stream actually streaming?
Two common causes: (1) you forgot
stream=True on the requests.post call (the API streams but your client buffers the whole response), or (2) a reverse proxy is buffering. Set X-Accel-Buffering: no in your request headers and make sure your client reads with response.iter_lines(), not response.text.How do I know when an individual agent is finished vs. the whole swarm?
How do I know when an individual agent is finished vs. the whole swarm?
agent_end fires once per agent. end fires once for the entire swarm after the last agent finishes and usage is reported. If you want to update per-agent UI state, key off agent_end; if you want to dismiss a global spinner, key off end.Next Steps
- Streaming Responses — the single-agent counterpart on
/v1/agent/completions - Sequential Workflow — the non-streaming version of the swarm type used above
- Multi-Turn Conversations with Agent History — pair streaming with history threading for live chat UX