Skip to main content

What This Example Shows

  • One real, hard analytical problem solved end-to-end with POST /v1/reasoning-agent/completions
  • Why a normal agent call hallucinates on this and how swarm_type: "self-consistency" fixes it
  • How to tune num_samples and max_loops for the depth-vs-cost tradeoff on a single high-stakes question
  • How to read the structured output (output_type: "dict-final") to extract the answer programmatically
  • A direct side-by-side: same task, plain agent vs. reasoning agent, real failure modes
The Reasoning Agent endpoint (POST /v1/reasoning-agent/completions) is a premium-only feature available on Pro, Ultra, and Premium plans. Free tier keys receive a 403. Upgrade your account to unlock self-consistency, iterative refinement, and multi-sample deliberation.

Why This Matters

Some questions look like they should be a one-shot LLM call but aren’t. A multi-step accrual reconciliation. A cross-jurisdiction tax allocation. A logic puzzle with five interlocking constraints. An algorithm correctness proof. These are problems where a wrong-looking-but-confident answer is worse than no answer, because nobody downstream has the time to re-derive the calculation by hand. A plain gpt-4 call will produce a single deterministic chain of thought and confidently miscount the deferred revenue. A reasoning agent samples multiple chains, votes on the convergent answer, and surfaces the disagreement when the chains don’t converge — which is exactly the signal you need to escalate a question to a human. The job done isn’t “smarter LLM,” it’s calibrated confidence on hard analytical work.

Step 1: Setup

pip install requests python-dotenv
import json
import os

import requests
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("SWARMS_API_KEY")
BASE_URL = "https://api.swarms.world"

headers = {
    "x-api-key": API_KEY,
    "Content-Type": "application/json",
}

Step 2: The Problem

This is the kind of problem a controller would hand to a senior associate. It is solvable in closed form, but every step compounds the next, so a single dropped assumption produces a confidently wrong number.
problem = """
A SaaS company sells a 24-month enterprise subscription on October 1, 2024 for
a total contract value of $480,000, billed annually in advance.

Contract terms:
- The first $240,000 is billed and collected on October 1, 2024, covering
  October 1, 2024 through September 30, 2025.
- The second $240,000 is billed and collected on October 1, 2025, covering
  October 1, 2025 through September 30, 2026.
- The customer is granted a one-time 5% discount that applies to the SECOND
  year only, reducing the second-year invoice (and cash collected) to
  $228,000. The discount is contractual at inception and is not a separate
  performance obligation.
- Revenue is recognized ratably on a straight-line basis under ASC 606.
- The company's fiscal year ends December 31.

Required:
1. What is the total transaction price under ASC 606 at contract inception?
2. What is the monthly revenue recognition amount?
3. As of December 31, 2024, what is (a) revenue recognized year-to-date,
   (b) deferred revenue on the balance sheet, and (c) contract asset or
   contract liability if any?
4. As of December 31, 2025, what is (a) revenue recognized in fiscal 2025,
   (b) deferred revenue on the balance sheet, and (c) accounts receivable
   if any?

Show every step of the calculation. Return a final JSON object with keys:
transaction_price, monthly_revenue, dec_2024 (with sub-keys revenue_ytd,
deferred_revenue, contract_asset_or_liability), and dec_2025 (with sub-keys
revenue_fy2025, deferred_revenue, accounts_receivable).
""".strip()
The trick step is question 3. The $240k cash collected in Oct 2024 covers months 1-12 of the contract, but the transaction price is allocated over the full 24 months at $19,500/month (not $20,000/month), so YTD recognition through Dec 31, 2024 is 3 months × $19,500 = $58,500, not $60,000. The remainder of the $240k cash sits as deferred revenue. A plain LLM call will frequently compute $20,000/month and miss the discount allocation.

Step 3: First — Try It With a Plain Agent (for Comparison)

Run the exact same problem through the standard agent completion endpoint. Keep this number — you’ll compare against the reasoning agent’s answer.
plain_payload = {
    "agent_config": {
        "agent_name": "Accounting-Plain",
        "description": "Standard accounting agent (single chain of thought).",
        "system_prompt": (
            "You are a CPA. Solve accounting problems carefully under ASC 606. "
            "Return the final answer as JSON."
        ),
        "model_name": "gpt-4.1",
        "max_loops": 1,
        "max_tokens": 4000,
    },
    "task": problem,
}

plain_response = requests.post(
    f"{BASE_URL}/v1/agent/completions",
    headers=headers,
    json=plain_payload,
    timeout=300,
)
plain_response.raise_for_status()
print("=" * 60)
print("PLAIN AGENT — single chain of thought")
print("=" * 60)
print(json.dumps(plain_response.json(), indent=2)[:1500])
In practice on this problem the plain agent will produce a confident but wrong allocation roughly 40-60% of the time — most often by computing $20,000/month for the first year (ignoring that the $12,000 discount must be allocated across all 24 months under ASC 606).

Step 4: Now Solve It With a Reasoning Agent

Switch to /v1/reasoning-agent/completions with swarm_type: "self-consistency". The agent runs num_samples independent reasoning chains and votes on the convergent answer.
reasoning_payload = {
    "agent_name": "Accrual-Reasoner",
    "description": (
        "ASC 606 revenue recognition specialist with self-consistency "
        "deliberation across multiple independent reasoning chains."
    ),
    "model_name": "anthropic/claude-opus-4-8",
    "system_prompt": (
        "You are a CPA specializing in ASC 606 revenue recognition. "
        "When solving a problem, derive the total transaction price first, "
        "then allocate ratably across the full performance period. Show every "
        "step. The final line of your response MUST be a single JSON object "
        "containing the requested fields."
    ),
    "swarm_type": "self-consistency",
    "num_samples": 5,
    "max_loops": 1,
    "output_type": "dict-final",
    "task": problem,
}

reasoning_response = requests.post(
    f"{BASE_URL}/v1/reasoning-agent/completions",
    headers=headers,
    json=reasoning_payload,
    timeout=600,
)
reasoning_response.raise_for_status()
result = reasoning_response.json()

print("=" * 60)
print("REASONING AGENT — self-consistency, 5 samples")
print("=" * 60)
print(json.dumps(result, indent=2)[:3000])

Step 5: Extract the Answer

output_type: "dict-final" returns the convergent final answer as a dictionary alongside the per-sample chains. The convergent answer is the one to trust; the per-sample chains are the audit trail.
import re

outputs = result.get("outputs", {})
final = outputs.get("final_answer") if isinstance(outputs, dict) else None

if not final:
    # Some response shapes nest the JSON in a string — extract it
    blob = json.dumps(outputs)
    match = re.search(r"\{[^{}]*transaction_price[^{}]*\}", blob, re.DOTALL)
    if match:
        final = json.loads(match.group(0))

print("\nExtracted answer:")
print(json.dumps(final, indent=2))

# Sanity-check the structure
expected = {
    "transaction_price": 468_000,
    "monthly_revenue": 19_500,
    "dec_2024": {
        "revenue_ytd": 58_500,           # 3 months
        "deferred_revenue": 181_500,     # 240,000 cash - 58,500 recognized
        "contract_asset_or_liability": "contract liability (deferred revenue)",
    },
    "dec_2025": {
        "revenue_fy2025": 234_000,       # 12 months
        "deferred_revenue": 175_500,     # carry-forward math
    },
}
print("\nExpected key figures:")
print(json.dumps(expected, indent=2))
$468,000 transaction price = $480,000 contract value − $12,000 discount, allocated ratably over 24 months = $19,500/month. Three months in fiscal 2024 = $58,500 recognized; the rest of the first $240,000 cash collected sits as a contract liability.

Step 6: Tuning the Depth Knob

num_samples is the deliberation budget. Use this table to pick a setting:
num_samplesWhen to useCost multiplier
1Smoke testing, prototyping1x
3Default for hard analytical work~3x
5High-stakes single-answer questions (this example, legal opinions, regulatory filings)~5x
7-10Convergence diagnostics — if 7 samples still disagree, escalate to a human~7-10x
The right read on convergence: if all 5 samples produce the same final JSON, ship the answer. If 4-of-5 agree, ship with a flag. If 3-of-5 or worse, the problem is genuinely ambiguous and you need a human in the loop — that signal is what you’re paying for, not just the answer.

Picking the Right swarm_type

swarm_typeWhat it doesBest for this kind of work
reasoning-agentSingle chain with structured thinkingQuick checks, drafts
self-consistencyN independent chains, vote on convergenceThis example, calibrated answers
ireIterative refinement — each loop critiques the priorAlgorithmic / proof-style problems
reasoning-duoTwo agents debate two approachesContrarian framing, strategy reviews
consistency-agentLogical-consistency focusedPure math and logic puzzles
For accrual accounting, legal interpretation, and financial calculation work, self-consistency is the right default — the failure mode is “confidently wrong on one chain,” and voting across chains is the specific antidote.

Cost vs. Hiring a Senior Associate

Concrete numbers for a high-stakes analytical question (the accrual problem above):
ApproachTime to answerCostCalibrated confidence?
Senior accounting associate25-40 min~$80-130 fully-loaded laborYes (but variable)
Plain gpt-4 call~6 sec~$0.05No — single confident chain, fails ~40-60%
Reasoning agent (self-consistency, 5 samples, Opus 4.8)~30-60 sec~$0.20-0.50Yes — convergence is the signal
Use the reasoning agent for the questions where being wrong is expensive — accruals that hit a 10-K, tax allocations on a large deal, contract interpretations, complex underwriting. Use a plain agent for everything else. The reasoning agent is ~250x cheaper than the associate and gives you a convergence signal the associate can’t, because a single human only has one chain of thought.
A reasoning agent producing 5-of-5 convergence is strong evidence the answer is correct under the given assumptions. It is not evidence the assumptions are correct. For anything that hits a regulated filing, the calculated answer goes to a licensed human reviewer who validates the inputs.

Next Steps