What This Example Shows
- How to score or triage tens of thousands of records in one batch submission
- A real lead-scoring workload built on
/v1/agent/batch/completions - Per-record cost math you can take to your CFO
- How to chunk a large input list to stay inside request limits
- Where this beats hand-rolled
asyncio.gatheragainst the single-agent endpoint
Why This Matters
Every revenue team has a backlog of records that need a human-quality judgment call: leads to qualify, tickets to triage, resumes to screen, transcripts to tag. Hiring a person to do this work costs $30-$60 per hour and produces 20-40 decisions per hour. Sending each row to a single-agent endpoint one-at-a-time gets you the right answer but burns wall-clock time and connection overhead. The batch endpoint compresses that same workload into one request, parallelized server-side, with a single bill at the end. This tutorial shows the concrete shape of that job.Step 1: Setup
Step 2: Define the Lead Scoring Agent
We will use one agent definition and reuse it across every record. The agent reads a lead profile and returns a score and a one-line reason.Step 3: Load Your Records
In a real workload these come from a CRM export, a database query, or an S3 file. For this tutorial we generate a synthetic list of 10,000 leads.Step 4: Convert Records into Batch Requests
Each item in the batch body is oneAgentCompletion: the same agent_config plus a per-record task.
Step 5: Submit in Chunks
Send the full list in chunks of 500-1000 to keep request bodies reasonable and to let you checkpoint progress. Each chunk is one POST.The server parallelizes inside each chunk. You are not pinging a token-per-second loop — you are submitting one job that fans out behind the gateway. Chunking exists to keep your local memory and request bodies sane, not to throttle the server.
Step 6: Aggregate and Route
Parse each agent response, slot leads into A/B/C/D tiers, and forward only A-tier leads to your SDRs.The exact response shape depends on the model and whether the agent returns structured outputs. Wrap the JSON parse in a try/except and dump unparseable rows to a review queue — never let one bad row halt a 10k-lead pipeline.
The Cost Math
Pricing varies by model and current token rates — these numbers are illustrative, not a quote.| Approach | Wall time | Direct cost | Burdened cost |
|---|---|---|---|
| Human SDR scoring 10,000 leads | ~400 hours | $20,000 at $50/hr | $30,000+ with overhead |
| Single-agent endpoint, sequential | ~6 hours | ~$30 | $30 + 6 hours of your time |
| Batch endpoint, 20 chunks of 500 | ~25 minutes | ~$30 | $30 + 25 minutes |
Adapting the Pattern
Swap theagent_config system prompt and the per-record task shape to retarget:
| Workload | System prompt focus | Per-record task |
|---|---|---|
| Support ticket triage | severity + category + suggested team | ticket body + customer tier |
| Resume screening | match-to-role score + flags | resume text + JD summary |
| Transcript tagging | topic labels + sentiment | transcript window |
| Compliance review | policy violations + risk | document chunk + policy list |
| Product review summarization | sentiment + key claims | review text + product SKU |
Next Steps
- Batch Swarm Completions for Overnight Reports when one agent isn’t enough per record
- Batch Agent Completions (Single Agent) for the request-shape mechanics
- Streaming when you need real-time token output instead of batch throughput