What This Example Shows
- A
HierarchicalSwarmwith a Research Director coordinating three specialist workers — Rating Tracker, Price Target Tracker, and Key Driver Extractor — running in parallel against a single PDF note - OpenAI-format function tools for PDF parsing, ratings/target regex extraction, consensus lookup, and Slack delivery
- A mixed-provider agent team: Claude Sonnet 4.5 directing, GPT-4.1 / GPT-4.1-mini for surgical extraction, Gemini 2.5 Pro for long-document grounding
- A consensus-check step that flags every new target as above, in line with, or below the street
- End-of-day batch processing across 150+ notes via
/v1/swarm/batch/completions, scheduled at 4:30pm ET, output is a ranked delta sheet - The structured signal payload your OMS, EMS, or research database actually wants — not free-form text
This pipeline processes a high volume of PDF documents per day. The batch endpoint and parallel function-tool execution sit on the Premium tier — Pro will rate-limit you once daily note volume crosses ~50. Read the Cost Optimization Playbook before turning the cron on, and upgrade at https://swarms.world/platform/account.
Why This Matters
Every PM in the firm wakes up to a ranked sheet of rating + target deltas — already cross-referenced against consensus — for the price of one Bloomberg lunch. A multi-strategy PM gets roughly 150 sell-side notes hitting their inbox between 5am and 9am ET on any given trading day. Reading every page is physically impossible and almost entirely a waste — 95% of any sell-side note is restated boilerplate and an unchanged thesis. The actionable content is in the deltas: a Goldman analyst going from Buy to Hold, JPM lifting their target by 18%, Morgan Stanley swapping in a new bear case on China data center demand. Those three things move books. This pipeline does nothing except find those deltas, score them against street consensus, and put a structured signal in front of the PM before the bell.The Architecture
Step 1: Setup
Install dependencies and configure credentials. The pipeline needs your Swarms API key plus either IMAP credentials for an inbox sweep or an S3 bucket where your prime broker drops PDFs.Step 2: Define the Function Tools
Every worker agent gets the tools it actually needs — nothing more. Tools are OpenAI-format function schemas; your runtime resolves the calls server-side or replays them locally after the swarm finishes, depending on your tool host.Step 3: Define the Four Agents
The Research Director is the only agent with thecoordinator role — it owns synthesis, consensus reasoning, and the final signal payload. The three workers each get exactly the tools their job requires.
The model mix is deliberate. Rating extraction is a small classification problem —
gpt-4.1-mini is plenty and roughly 5x cheaper. Target extraction needs careful number handling — gpt-4.1 earns its keep. Driver extraction is the only step that reads the full multi-page document end to end, and Gemini 2.5 Pro is empirically the strongest on long-document grounding. The Director uses Claude Sonnet 4.5 because synthesis + tool-call orchestration is what it was built for.Step 4: Process One Note End-to-End
Start with a single note. This is the loop you scale.Step 5: End-of-Day Batch Across All Notes
The real value shows up when you sweep the entire inbox in one shot. Build a payload list keyed by the day’s PDF drops and hand the whole thing to/v1/swarm/batch/completions.
Schedule this script as a cron job for 4:30pm ET on weekdays. The post-close window means every note that hit the inbox during regular trading hours is captured, the consensus service has finished its end-of-day refresh, and the PM sees the ranked delta sheet on Slack before they leave the desk — not in their pre-market inbox the next morning.
Step 6: The Output Schema
The Research Director is constrained to emit signals in this exact shape. This is the contract your OMS, EMS, research database, and PM Slack channel all consume.audit_trail block is what makes this defensible in a compliance review — every field in the signal traces back to a specific worker output, which traces back to a specific line in the PDF.
Real Cost vs. Junior Analyst Reading Notes
| Scenario | Per note | Per trading day (150 notes) | Annualized (~250 days) |
|---|---|---|---|
| HierarchicalSwarm pipeline (mixed-provider, batched) | ~$0.25 | ~$38 | ~$9,500 |
| Junior analyst (fully loaded $150k) reading notes | — | Maxes out at ~20 notes/day | $150,000 — and they skip dinner |
| Two-analyst rotation just for note triage | — | ~40 notes/day | $300,000 — and they still miss 110 |
Next Steps
- Build an AI Hedge Fund Research Pipeline — the upstream watchlist research note that the signal sheet feeds into
- SEC Filing Triage Pipeline — same pattern applied to 10-K/10-Q/8-K dumps from EDGAR
- Earnings Call Analysis Swarm — extend the pipeline to live transcript ingestion during earnings season