Tollgate - Real-time Margin Observability for AI Agents

Getting Started

Product Overview

Tollgate is a real-time gross-margin observability platform purpose-built for AI agent companies. Standard billing systems answer "how much should I charge?" — Tollgate answers "am I actually making money on this customer right now?"

One-liner

Orb / Stripe tells you what to charge. Tollgate tells you if you're making money.

AI agents trigger fanning loops — LLM queries, retrieval, tool calls, sandboxes. Costs scale exponentially and remain invisible until the monthly provider invoice lands. Tollgate computes accurate per-run margin, flags losing customers, and gives you time to act.

Per-run margin

Exact cost and P&L for every agent execution.

Leak detection

Auto-flags customers whose cost exceeds their plan revenue.

Dual-DB ingest

DynamoDB hot path + Aurora for finance-grade history.

Getting Started

Quick Start

Step 1 — Get an API key

Generate a key in Settings → API Keys (prefix: tg_live_…). Add it to your environment:

bash

TOLLGATE_API_KEY=tg_live_your_key_here

Step 2 — Install the SDK

bash

# TypeScript / Node.js
npm install @tollgateai/sdk
# or: pnpm add @tollgateai/sdk  |  yarn add @tollgateai/sdk

# Python
pip install tollgateai

Step 3 — Wrap your provider client

Pick the snippet below for your LLM provider. The wrapper intercepts each response, extracts token counts, and fires POST /api/track non-blocking in the background. Your code sees the original provider response — unchanged.

Anthropic quickstart

import Anthropic from '@anthropic-ai/sdk';
import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';

const tollgate = createTollgateClient(); // reads TOLLGATE_API_KEY
const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
  customerId: 'cust_acme',    // your external customer id — required
  runId: 'ticket_8842',       // your run/session id — recommended
  agentId: 'support-agent',   // optional: which agent within the run
});

// Use anthropic exactly as before — no other changes needed.
const msg = await anthropic.messages.create({
  model: 'claude-opus-4-8',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Resolve this ticket.' }],
});

Getting Started

Tollgate AI Skill

The Tollgate AI Skill is a set of pre-baked rules and instructions for AI coding tools (Claude Code, Codex, Cursor, GitHub Copilot, Windsurf, etc.) that enables your AI assistant to auto-instrument your AI application with Tollgate instantly and flawlessly.

No more hours of manual code scanning or copying/pasting setup logic. The skill teaches your AI coding assistant how to wrap clients, manage streaming, group runs, set idempotency keys, and exclude sensitive prompt payloads.

Install the Skill

You can install the skill instantly using npx (recommended for Node.js developers) or a direct curl script:

Option A — via npx (npm registry):

bash

# Install for Claude Code
npx tollgate-skill --tool claude

# Install for Codex / Custom Agents
npx tollgate-skill --tool codex

# Install for Cursor
npx tollgate-skill --tool cursor

# Install for GitHub Copilot
npx tollgate-skill --tool copilot

# Install for Windsurf
npx tollgate-skill --tool windsurf

# Install for all tools at once
npx tollgate-skill --tool all

Option B — via curl one-liner:

bash

# Install for Claude Code
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool claude

# Install for Codex / Custom Agents
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool codex

# Install for Cursor
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool cursor

# Install for GitHub Copilot
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool copilot

# Install for Windsurf
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool windsurf

# Install for all tools at once
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool all

How to Use

Claude Code: Restart Claude Code and type /tollgate. Claude Code will read the skill and auto-instrument your project.
Codex: Installs to AGENTS.md in your project root. Codex reads this file automatically when running tasks in your repo.
Cursor: It places a rule at .cursor/rules/tollgate.mdc. Cursor picks it up automatically when you are working on LLM call logic.
GitHub Copilot: Adds rules to .github/copilot-instructions.md, guiding Copilot on how to write Tollgate-instrumented completions.
Windsurf: Installs a rule in .windsurf/rules/tollgate.md to guide the Cascade assistant.

Provider Integrations

Anthropic

wrapAnthropic wraps the official @anthropic-ai/sdk client. It captures input, output, reasoning, and cached token counts automatically — including extended thinking models (Claude 3.7+).

Anthropic

import Anthropic from '@anthropic-ai/sdk';
import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';

const tollgate = createTollgateClient(); // reads TOLLGATE_API_KEY
const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
  customerId: 'cust_acme',
  runId: 'ticket_8842',
  agentId: 'support-agent',  // optional
});

const msg = await anthropic.messages.create({
  model: 'claude-opus-4-8',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});

Extended thinking

For Claude 3.7+ models with extended thinking enabled, reasoningTokens are automatically captured and billed at output rates — the most expensive token type. Tollgate accounts for this correctly.

Provider Integrations

OpenAI

wrapOpenAI wraps the openai Node.js client. Works with chat completions, responses API, and streaming (with stream_options — see Streaming section).

OpenAI

import OpenAI from 'openai';
import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';

const tollgate = createTollgateClient();
const openai = wrapOpenAI(new OpenAI(), tollgate, {
  customerId: 'cust_acme',
  runId: 'ticket_8842',
});

const res = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

Provider Integrations

OpenAI-Compatible Gateways

Use wrapOpenAI with provider: 'openai_compatible' for any OpenAI-compatible gateway: Groq, OpenRouter, Vercel AI Gateway, vLLM, Together AI, Fireworks.

Groq / OpenRouter / OpenAI-compatible

import OpenAI from 'openai';
import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';

const tollgate = createTollgateClient();

// Groq
const groq = wrapOpenAI(
  new OpenAI({ baseURL: 'https://api.groq.com/openai/v1', apiKey: process.env.GROQ_API_KEY }),
  tollgate,
  { customerId: 'cust_acme', runId: 'ticket_8842', provider: 'openai_compatible' },
);

// OpenRouter
const openrouter = wrapOpenAI(
  new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: process.env.OPENROUTER_API_KEY }),
  tollgate,
  { customerId: 'cust_acme', runId: 'ticket_8842', provider: 'openai_compatible' },
);

Stream options required

For OpenAI-compatible streaming, add stream_options: { include_usage: true }— otherwise token counts won't be captured. See the Streaming section for a full example.

Provider Integrations

Google Gemini

wrapGemini wraps the @google/genai client and captures token counts, thinking tokens, video tokens, and web search grounding costs.

Google Gemini

import { GoogleGenAI } from '@google/genai';
import { createTollgateClient, wrapGemini } from '@tollgateai/sdk';

const tollgate = createTollgateClient();
const gemini = wrapGemini(
  new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY }),
  tollgate,
  { customerId: 'cust_acme', runId: 'ticket_8842' },
);

const res = await gemini.models.generateContent({
  model: 'gemini-2.5-pro',
  contents: [{ role: 'user', parts: [{ text: 'Hello' }] }],
});

Provider Integrations

AWS Bedrock

wrapBedrock wraps the AWS SDK's BedrockRuntimeClient. Works with InvokeModelCommand and InvokeModelWithResponseStreamCommand.

AWS Bedrock

import { BedrockRuntimeClient } from '@aws-sdk/client-bedrock-runtime';
import { createTollgateClient, wrapBedrock } from '@tollgateai/sdk';

const tollgate = createTollgateClient();
const bedrock = wrapBedrock(
  new BedrockRuntimeClient({ region: 'us-east-1' }),
  tollgate,
  { customerId: 'cust_acme', runId: 'ticket_8842' },
);

// Use bedrock exactly as before — InvokeModel / stream.

Provider Integrations

Streaming

Streaming is handled transparently — the wrapper accumulates token counts as the stream completes and fires a single tracking event at the end without blocking your code.

OpenAI / OpenAI-compatible: required option

For OpenAI and OpenAI-compatible providers you must add stream_options: { include_usage: true }. Without it, the provider does not emit token counts and Tollgate cannot track the event.

OpenAI streaming with usage

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
  stream_options: { include_usage: true }, // required for Tollgate
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Tollgate fires the tracking event after the stream closes.

Anthropic streaming does not require any extra options — token counts are always included in the stream's final event.

Core Mechanics

Multi-Step Agents

A run (runId) groups all LLM calls for one end-to-end task. Call tollgate.resolve() once on the final step to close the run and gate outcome-priced revenue.

Closing a run with outcome

await tollgate.resolve({
  runId: 'ticket_8842',
  customerId: 'cust_acme',
  outcome: 'resolved',       // 'resolved' | 'escalated' | 'failed'
  revenueUnitCents: 50,      // $0.50 per resolved ticket — overrides plan default
});

Outcome	Revenue recognized?	Costs tracked?
`resolved`	Yes — outcome-priced revenue recognized	Yes
`escalated`	No — run closed without revenue	Yes
`failed`	No — run closed without revenue	Yes

Omitting outcome on a single-call run treats it as resolved. For multi-step agents, always call resolve() — otherwise the run stays open and margin stays uncomputed.

Core Mechanics

Manual Tracking

If you can't wrap the provider client, call tollgate.track() directly after each LLM response. All the same fields apply.

Manual track() call

const startTime = Date.now();
const response = await rawAnthropicClient.messages.create({ ... });

await tollgate.track({
  customerId: 'cust_acme',
  runId: 'ticket_8842',
  provider: 'anthropic',
  model: 'claude-opus-4-8',
  tokensIn: response.usage.input_tokens,
  tokensOut: response.usage.output_tokens,
  reasoningTokens: response.usage.cache_creation_input_tokens ?? 0,
  cachedTokens: response.usage.cache_read_input_tokens ?? 0,
  idempotencyKey: `ticket_8842#step_1`, // unique per event — prevents double-counting
  latencyMs: Date.now() - startTime,
});

Idempotency key pattern

Use a stable pattern like runId#stepN — e.g. ticket_8842#step_1. On retries, the server returns 200 duplicate (safe to ignore) instead of double-counting the cost.

Core Mechanics

Customer Registration

Call upsertCustomer() before sending usage events so the revenue plan is ready and recognized from event one. This is especially important for usage_based pricing where revenue is computed at ingest time.

await tollgate.upsertCustomer({
  externalId: 'cust_acme',    // must match customerId used in track/wrap
  name: 'Acme Corp',
  plan: {
    name: 'Growth',
    pricingModel: 'per_unit',   // 'per_unit' | 'per_resolution' | 'usage_based' | 'per_seat' | 'flat' | 'hybrid'
    unitRevenueCents: 50,       // $0.50 per resolved ticket
    baseRevenueCents: 0,
  },
});

Pricing models

per_unitFixed revenue per run (e.g. $0.50 per solved ticket).

per_resolutionRevenue only when outcome = 'resolved'.

usage_basedRevenue ∝ token usage — computed per event.

per_seatMonthly flat fee per workspace seat.

flatFixed monthly subscription fee.

hybridFlat base + per-unit overage.

Core Mechanics

Dual-Database Architecture

Tollgate separates data processing into two purpose-built layers — the deliberate architectural choice that makes low-latency ingest coexist with finance-grade analytical querying.

Hot Path — DynamoDB

›Raw usage events at sub-10 ms latency
›Conditional writes on idempotency_key — once-only ledgering
›TTL-managed raw event retention
›DynamoDB Streams feeds background workers
›margin_rollups table for live dashboard reads

Cold Path — Aurora PostgreSQL

›Finance-grade immutable history
›Provider rate-card joins and cost computation
›Window queries for trend and anomaly detection
›Alert rule evaluation against daily rollups
›CSV export and audit trail

Why two databases?

Collapsing both into one would force a trade-off: a transactional DB fast enough for ingest can't do efficient window queries; a columnar warehouse fast enough for analytics can't handle sub-10 ms writes with conditional deduplication. The dual split gives you both without compromise.

Core Mechanics

Token Cost Economics

Reasoning / thinking models bill thinking tokens at output token rates— typically 3–10× the input rate. A single complex run can cost $0.53 against a $0.50 revenue plan. Standard monitoring won't surface this until the invoice lands.

Case Study — Customer Support Agent (Plan: $0.50 per solved ticket)

Run A — FAQ Bot: +96.7% Margin

3,000 in / 500 out tokens · Cost: $0.0165 · Revenue: $0.50 · Profit: +$0.4835

Run B — Billing Dispute Bot: −6.2% Margin

40,000 in / 25,000 reasoning / 2,000 out tokens · Cost: $0.5310 · Revenue: $0.50 · Loss: −$0.031

Blended across both customers they appear ~46% profitable. Tollgate breaks the blend apart and flags Customer B — giving you time to reprice, add a reasoning cap, or move them to a usage-based tier.

Token categories Tollgate tracks

tokensIn (standard input) · tokensOut (output) · reasoningTokens (billed at output rate) · cachedTokens (cache-read, reduced rate) · cacheWrite5mTokens / cacheWrite1hTokens · audioTokensIn / audioTokensOut · imageTokensIn / imageTokensOut · videoTokensIn · webSearchRequests

API Reference

POST /api/track — Field Reference

All fields sent to POST https://www.tollgateai.dev/api/track. Header: Authorization: Bearer tg_live_… · Content-Type: application/json

Field	Type	Description
customerId*	string	Your external customer ID. Attributes all costs and revenue to this tenant.
runId*	string	Groups all LLM calls for one end-to-end task or session.
provider*	string	anthropic · openai · openai_compatible · bedrock · google
model*	string	Model name as returned by the provider — used for rate-card lookup.
idempotencyKey*	string	Unique per event. Prevents double-counting on retries. Pattern: runId#stepN.
agentId	string	Which agent within the run — for per-agent cost breakdown.
type	string	llm (default) · tool · retrieval
tokensIn	int	Standard (non-cached) input tokens.
tokensOut	int	Output tokens.
reasoningTokens	int	Thinking/reasoning tokens — billed at output rate. Required for o1, o3, Claude 3.7+.
cachedTokens	int	Cache-read input tokens — billed at reduced rate.
cacheWrite5mTokens	int	Cache-write tokens with 5-minute TTL (Anthropic prompt caching).
cacheWrite1hTokens	int	Cache-write tokens with 1-hour TTL.
toolCalls	int	Number of tool calls in this LLM response.
toolName	string	Tool name for per-tool cost breakdown.
audioTokensIn	int	Audio input tokens (OpenAI Realtime API).
audioTokensOut	int	Audio output tokens.
imageTokensIn	int	Image/vision input tokens.
imageTokensOut	int	Image generation output tokens.
videoTokensIn	int	Video input tokens (Gemini).
webSearchRequests	int	Grounded web search calls (Anthropic / Gemini). Priced per request.
latencyMs	int	End-to-end request latency in milliseconds.
externalCostCents	float	Cost of external tools (image gen APIs, sandboxes) — added directly to run cost.
providerCostCents	float	Exact cost from provider/gateway — skips Tollgate's rate-card lookup.
outcome	string	resolved · escalated · failed — set only on the final closing event.
revenueUnitCents	int	Per-run revenue in cents. Overrides the customer's plan default.
ts	ISO string	Event timestamp. Defaults to server receive time if omitted.

Privacy enforcement

Fields named prompt, messages, content, input, or output are rejected with HTTP 400. Never send prompt content to Tollgate.

API Reference

Error Codes

Status	Meaning	Action
201	created	Event ingested successfully.	None.
200	duplicate	Same idempotencyKey already stored.	Safe to ignore — no double-count.
400	bad_request	Validation error or prompt content detected.	Fix the payload — don't retry.
401	unauthorized	Invalid or missing API key.	Check TOLLGATE_API_KEY.
402	quota_exceeded	Monthly event quota reached.	Upgrade plan in Settings.
429	rate_limited	Rate limit exceeded.	Respect Retry-After header.
500	server_error	Internal error — event may not have been stored.	Retry with exponential back-off.

API Reference

Integration Checklist

TOLLGATE_API_KEY set in environment — never commit it.

Provider client wrapped (or tollgate.track() called after each LLM call).

customerId matches real customer IDs in your system.

runId consistently identifies one end-to-end task.

idempotencyKey is stable and unique per event (pattern: runId#stepN).

reasoningTokens included for extended thinking models (Claude 3.7+, o1, o3).

cachedTokens included if using prompt caching.

outcome set on the closing event for per_unit / hybrid / per_resolution plans.

stream_options: { include_usage: true } added for OpenAI / OpenAI-compatible streaming.

upsertCustomer() called before first usage event for usage_based pricing.

recommended

No prompt content in any field.

Billing & Security

Plans & Pricing

All plans are self-serve and processed via Razorpay subscription cards inside Settings.

Tier	Price (INR)	Events / mo	Customers	Seats	Features
Starter	₹0 / mo	10,000	3	1	Overview & Customers dashboards
Growth	₹4,999 / mo	500,000	50	5	Email & webhook alerts, CSV exports
Scale	₹14,999 / mo	5,000,000	Unlimited	Unlimited	SSO/OIDC, Priority support

Events beyond your plan limit are rejected with 402 until you upgrade. Upgrading is instant — no re-integration required.

Billing & Security

Security & Privacy

Zero Payload Retention

SDKs analyze token counts. We never inspect, record, or route agent prompt strings, document contents, or any message bodies. Fields named prompt, messages, content, input, or output are rejected at ingestion (HTTP 400).

AES-256-GCM Encryption at Rest

All client credentials and API keys are encrypted at rest using AES-256-GCM via KMS-managed keys. Strict per-tenant row isolation is enforced in both DynamoDB (partition key scoping) and PostgreSQL (RLS policies).

Idempotent Ingest

Every write to DynamoDB uses a conditional expression on the idempotency_key — meaning events are guaranteed to land exactly once even on network retries or duplicated webhook calls.

SDK Trust Model

The SDK never modifies the response your code receives, never blocks on Tollgate errors, and always passes through to the original provider client. Tollgate errors are logged locally but do not surface to your end users.