Skip to main content

What This Example Shows

  • How to score or triage tens of thousands of records in one batch submission
  • A real lead-scoring workload built on /v1/agent/batch/completions
  • Per-record cost math you can take to your CFO
  • How to chunk a large input list to stay inside request limits
  • Where this beats hand-rolled asyncio.gather against the single-agent endpoint
Premium-only endpoint. /v1/agent/batch/completions is restricted to Pro, Ultra, and Premium subscribers. Free-tier keys will get a 403. Upgrade your account to unlock high-throughput batch processing.

Why This Matters

Every revenue team has a backlog of records that need a human-quality judgment call: leads to qualify, tickets to triage, resumes to screen, transcripts to tag. Hiring a person to do this work costs $30-$60 per hour and produces 20-40 decisions per hour. Sending each row to a single-agent endpoint one-at-a-time gets you the right answer but burns wall-clock time and connection overhead. The batch endpoint compresses that same workload into one request, parallelized server-side, with a single bill at the end. This tutorial shows the concrete shape of that job.

Step 1: Setup

pip install swarms-client python-dotenv
export SWARMS_API_KEY="your-api-key-here"

Step 2: Define the Lead Scoring Agent

We will use one agent definition and reuse it across every record. The agent reads a lead profile and returns a score and a one-line reason.
import json
import os

import requests
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("SWARMS_API_KEY")
BASE_URL = "https://api.swarms.world"

headers = {"x-api-key": API_KEY, "Content-Type": "application/json"}

LEAD_SCORER_CONFIG = {
    "agent_name": "Lead Scoring Specialist",
    "description": "Scores inbound B2B sales leads on a 0-100 fit scale.",
    "system_prompt": (
        "You are a senior B2B sales operations analyst. Given a lead profile, "
        "score the lead on a 0-100 scale based on ICP fit, buying intent, and "
        "budget signals. Respond as strict JSON: "
        '{"score": int, "tier": "A"|"B"|"C"|"D", "reason": "one sentence"}'
    ),
    "model_name": "gpt-4.1",
    "max_loops": 1,
    "max_tokens": 200,
    "temperature": 0.2,
}

Step 3: Load Your Records

In a real workload these come from a CRM export, a database query, or an S3 file. For this tutorial we generate a synthetic list of 10,000 leads.
def build_lead_record(i: int) -> dict:
    return {
        "lead_id": f"L{i:05d}",
        "company": f"Acme Subsidiary {i}",
        "industry": ["fintech", "healthtech", "ecommerce", "logistics"][i % 4],
        "headcount": 50 + (i % 500),
        "title": ["VP Eng", "CTO", "Director of Data", "Head of Ops"][i % 4],
        "intent_signal": "downloaded whitepaper" if i % 3 == 0 else "visited pricing",
    }


leads = [build_lead_record(i) for i in range(10_000)]

Step 4: Convert Records into Batch Requests

Each item in the batch body is one AgentCompletion: the same agent_config plus a per-record task.
def lead_to_batch_item(lead: dict) -> dict:
    task = (
        f"Score this lead and return strict JSON.\n"
        f"Lead ID: {lead['lead_id']}\n"
        f"Company: {lead['company']}\n"
        f"Industry: {lead['industry']}\n"
        f"Headcount: {lead['headcount']}\n"
        f"Contact title: {lead['title']}\n"
        f"Intent signal: {lead['intent_signal']}"
    )
    return {"agent_config": LEAD_SCORER_CONFIG, "task": task}


batch_items = [lead_to_batch_item(lead) for lead in leads]

Step 5: Submit in Chunks

Send the full list in chunks of 500-1000 to keep request bodies reasonable and to let you checkpoint progress. Each chunk is one POST.
def run_chunk(chunk: list[dict]) -> list[dict]:
    response = requests.post(
        f"{BASE_URL}/v1/agent/batch/completions",
        headers=headers,
        json=chunk,
        timeout=900,
    )
    response.raise_for_status()
    return response.json()


CHUNK_SIZE = 500
all_results: list[dict] = []

for start in range(0, len(batch_items), CHUNK_SIZE):
    chunk = batch_items[start : start + CHUNK_SIZE]
    print(f"Submitting leads {start} - {start + len(chunk) - 1}")
    chunk_results = run_chunk(chunk)
    all_results.extend(chunk_results)

print(f"Scored {len(all_results)} leads.")
The server parallelizes inside each chunk. You are not pinging a token-per-second loop — you are submitting one job that fans out behind the gateway. Chunking exists to keep your local memory and request bodies sane, not to throttle the server.

Step 6: Aggregate and Route

Parse each agent response, slot leads into A/B/C/D tiers, and forward only A-tier leads to your SDRs.
tiers: dict[str, list[dict]] = {"A": [], "B": [], "C": [], "D": []}

for lead, result in zip(leads, all_results):
    try:
        # The agent output is a JSON string inside the response envelope.
        output_text = result["outputs"][-1]["content"] if isinstance(result.get("outputs"), list) else result.get("output", "")
        parsed = json.loads(output_text)
        tier = parsed.get("tier", "D")
        tiers.setdefault(tier, []).append({**lead, **parsed})
    except (json.JSONDecodeError, KeyError, TypeError):
        tiers["D"].append({**lead, "score": 0, "tier": "D", "reason": "parse_error"})

for tier, items in tiers.items():
    print(f"Tier {tier}: {len(items)} leads")

print("Top 5 A-tier leads:")
for item in tiers["A"][:5]:
    print(f"  {item['lead_id']} | {item['company']} | {item['reason']}")
The exact response shape depends on the model and whether the agent returns structured outputs. Wrap the JSON parse in a try/except and dump unparseable rows to a review queue — never let one bad row halt a 10k-lead pipeline.

The Cost Math

Pricing varies by model and current token rates — these numbers are illustrative, not a quote.
ApproachWall timeDirect costBurdened cost
Human SDR scoring 10,000 leads~400 hours$20,000 at $50/hr$30,000+ with overhead
Single-agent endpoint, sequential~6 hours~$30$30 + 6 hours of your time
Batch endpoint, 20 chunks of 500~25 minutes~$30$30 + 25 minutes
This batch costs roughly $25-$35 to run on gpt-4.1 with short outputs. A human SDR team would take ~400 hours at $50/hour to produce the same triage — about $20,000 in direct labor. Run the same job nightly and you have replaced a full-time research desk with a recurring cron and an API key.

Adapting the Pattern

Swap the agent_config system prompt and the per-record task shape to retarget:
WorkloadSystem prompt focusPer-record task
Support ticket triageseverity + category + suggested teamticket body + customer tier
Resume screeningmatch-to-role score + flagsresume text + JD summary
Transcript taggingtopic labels + sentimenttranscript window
Compliance reviewpolicy violations + riskdocument chunk + policy list
Product review summarizationsentiment + key claimsreview text + product SKU
Nothing else in this tutorial changes — same endpoint, same chunking, same cost-tracking story.

Next Steps