Product Overview
Tollgate is a real-time gross-margin observability platform purpose-built for AI agent companies. Standard billing systems answer "how much should I charge?" — Tollgate answers "am I actually making money on this customer right now?"
One-liner
AI agents trigger fanning loops — LLM queries, retrieval, tool calls, sandboxes. Costs scale exponentially and remain invisible until the monthly provider invoice lands. Tollgate computes accurate per-run margin, flags losing customers, and gives you time to act.
Per-run margin
Exact cost and P&L for every agent execution.
Leak detection
Auto-flags customers whose cost exceeds their plan revenue.
Dual-DB ingest
DynamoDB hot path + Aurora for finance-grade history.
Quick Start
Step 1 — Get an API key
Generate a key in Settings → API Keys (prefix: tg_live_…). Add it to your environment:
TOLLGATE_API_KEY=tg_live_your_key_hereStep 2 — Install the SDK
# TypeScript / Node.js
npm install @tollgateai/sdk
# or: pnpm add @tollgateai/sdk | yarn add @tollgateai/sdk
# Python
pip install tollgateaiStep 3 — Wrap your provider client
Pick the snippet below for your LLM provider. The wrapper intercepts each response, extracts token counts, and fires POST /api/track non-blocking in the background. Your code sees the original provider response — unchanged.
import Anthropic from '@anthropic-ai/sdk';
import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';
const tollgate = createTollgateClient(); // reads TOLLGATE_API_KEY
const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
customerId: 'cust_acme', // your external customer id — required
runId: 'ticket_8842', // your run/session id — recommended
agentId: 'support-agent', // optional: which agent within the run
});
// Use anthropic exactly as before — no other changes needed.
const msg = await anthropic.messages.create({
model: 'claude-opus-4-8',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Resolve this ticket.' }],
});Tollgate AI Skill
The Tollgate AI Skill is a set of pre-baked rules and instructions for AI coding tools (Claude Code, Codex, Cursor, GitHub Copilot, Windsurf, etc.) that enables your AI assistant to auto-instrument your AI application with Tollgate instantly and flawlessly.
No more hours of manual code scanning or copying/pasting setup logic. The skill teaches your AI coding assistant how to wrap clients, manage streaming, group runs, set idempotency keys, and exclude sensitive prompt payloads.
Install the Skill
You can install the skill instantly using npx (recommended for Node.js developers) or a direct curl script:
Option A — via npx (npm registry):
# Install for Claude Code
npx tollgate-skill --tool claude
# Install for Codex / Custom Agents
npx tollgate-skill --tool codex
# Install for Cursor
npx tollgate-skill --tool cursor
# Install for GitHub Copilot
npx tollgate-skill --tool copilot
# Install for Windsurf
npx tollgate-skill --tool windsurf
# Install for all tools at once
npx tollgate-skill --tool allOption B — via curl one-liner:
# Install for Claude Code
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool claude
# Install for Codex / Custom Agents
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool codex
# Install for Cursor
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool cursor
# Install for GitHub Copilot
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool copilot
# Install for Windsurf
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool windsurf
# Install for all tools at once
curl -fsSL https://raw.githubusercontent.com/Tollgateai/tollgate-skill/main/install.sh | bash -s -- --tool allHow to Use
- Claude Code: Restart Claude Code and type
/tollgate. Claude Code will read the skill and auto-instrument your project. - Codex: Installs to
AGENTS.mdin your project root. Codex reads this file automatically when running tasks in your repo. - Cursor: It places a rule at
.cursor/rules/tollgate.mdc. Cursor picks it up automatically when you are working on LLM call logic. - GitHub Copilot: Adds rules to
.github/copilot-instructions.md, guiding Copilot on how to write Tollgate-instrumented completions. - Windsurf: Installs a rule in
.windsurf/rules/tollgate.mdto guide the Cascade assistant.
Anthropic
wrapAnthropic wraps the official @anthropic-ai/sdk client. It captures input, output, reasoning, and cached token counts automatically — including extended thinking models (Claude 3.7+).
import Anthropic from '@anthropic-ai/sdk';
import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';
const tollgate = createTollgateClient(); // reads TOLLGATE_API_KEY
const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
customerId: 'cust_acme',
runId: 'ticket_8842',
agentId: 'support-agent', // optional
});
const msg = await anthropic.messages.create({
model: 'claude-opus-4-8',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello' }],
});Extended thinking
reasoningTokens are automatically captured and billed at output rates — the most expensive token type. Tollgate accounts for this correctly.OpenAI
wrapOpenAI wraps the openai Node.js client. Works with chat completions, responses API, and streaming (with stream_options — see Streaming section).
import OpenAI from 'openai';
import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';
const tollgate = createTollgateClient();
const openai = wrapOpenAI(new OpenAI(), tollgate, {
customerId: 'cust_acme',
runId: 'ticket_8842',
});
const res = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});OpenAI-Compatible Gateways
Use wrapOpenAI with provider: 'openai_compatible' for any OpenAI-compatible gateway: Groq, OpenRouter, Vercel AI Gateway, vLLM, Together AI, Fireworks.
import OpenAI from 'openai';
import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';
const tollgate = createTollgateClient();
// Groq
const groq = wrapOpenAI(
new OpenAI({ baseURL: 'https://api.groq.com/openai/v1', apiKey: process.env.GROQ_API_KEY }),
tollgate,
{ customerId: 'cust_acme', runId: 'ticket_8842', provider: 'openai_compatible' },
);
// OpenRouter
const openrouter = wrapOpenAI(
new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: process.env.OPENROUTER_API_KEY }),
tollgate,
{ customerId: 'cust_acme', runId: 'ticket_8842', provider: 'openai_compatible' },
);Stream options required
stream_options: { include_usage: true }— otherwise token counts won't be captured. See the Streaming section for a full example.Google Gemini
wrapGemini wraps the @google/genai client and captures token counts, thinking tokens, video tokens, and web search grounding costs.
import { GoogleGenAI } from '@google/genai';
import { createTollgateClient, wrapGemini } from '@tollgateai/sdk';
const tollgate = createTollgateClient();
const gemini = wrapGemini(
new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY }),
tollgate,
{ customerId: 'cust_acme', runId: 'ticket_8842' },
);
const res = await gemini.models.generateContent({
model: 'gemini-2.5-pro',
contents: [{ role: 'user', parts: [{ text: 'Hello' }] }],
});AWS Bedrock
wrapBedrock wraps the AWS SDK's BedrockRuntimeClient. Works with InvokeModelCommand and InvokeModelWithResponseStreamCommand.
import { BedrockRuntimeClient } from '@aws-sdk/client-bedrock-runtime';
import { createTollgateClient, wrapBedrock } from '@tollgateai/sdk';
const tollgate = createTollgateClient();
const bedrock = wrapBedrock(
new BedrockRuntimeClient({ region: 'us-east-1' }),
tollgate,
{ customerId: 'cust_acme', runId: 'ticket_8842' },
);
// Use bedrock exactly as before — InvokeModel / stream.Streaming
Streaming is handled transparently — the wrapper accumulates token counts as the stream completes and fires a single tracking event at the end without blocking your code.
OpenAI / OpenAI-compatible: required option
stream_options: { include_usage: true }. Without it, the provider does not emit token counts and Tollgate cannot track the event.const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
stream_options: { include_usage: true }, // required for Tollgate
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Tollgate fires the tracking event after the stream closes.Anthropic streaming does not require any extra options — token counts are always included in the stream's final event.
Multi-Step Agents
A run (runId) groups all LLM calls for one end-to-end task. Call tollgate.resolve() once on the final step to close the run and gate outcome-priced revenue.
await tollgate.resolve({
runId: 'ticket_8842',
customerId: 'cust_acme',
outcome: 'resolved', // 'resolved' | 'escalated' | 'failed'
revenueUnitCents: 50, // $0.50 per resolved ticket — overrides plan default
});| Outcome | Revenue recognized? | Costs tracked? |
|---|---|---|
resolved | Yes — outcome-priced revenue recognized | Yes |
escalated | No — run closed without revenue | Yes |
failed | No — run closed without revenue | Yes |
outcome on a single-call run treats it as resolved. For multi-step agents, always call resolve() — otherwise the run stays open and margin stays uncomputed.Manual Tracking
If you can't wrap the provider client, call tollgate.track() directly after each LLM response. All the same fields apply.
const startTime = Date.now();
const response = await rawAnthropicClient.messages.create({ ... });
await tollgate.track({
customerId: 'cust_acme',
runId: 'ticket_8842',
provider: 'anthropic',
model: 'claude-opus-4-8',
tokensIn: response.usage.input_tokens,
tokensOut: response.usage.output_tokens,
reasoningTokens: response.usage.cache_creation_input_tokens ?? 0,
cachedTokens: response.usage.cache_read_input_tokens ?? 0,
idempotencyKey: `ticket_8842#step_1`, // unique per event — prevents double-counting
latencyMs: Date.now() - startTime,
});Idempotency key pattern
runId#stepN — e.g. ticket_8842#step_1. On retries, the server returns 200 duplicate (safe to ignore) instead of double-counting the cost.Customer Registration
Call upsertCustomer() before sending usage events so the revenue plan is ready and recognized from event one. This is especially important for usage_based pricing where revenue is computed at ingest time.
await tollgate.upsertCustomer({
externalId: 'cust_acme', // must match customerId used in track/wrap
name: 'Acme Corp',
plan: {
name: 'Growth',
pricingModel: 'per_unit', // 'per_unit' | 'per_resolution' | 'usage_based' | 'per_seat' | 'flat' | 'hybrid'
unitRevenueCents: 50, // $0.50 per resolved ticket
baseRevenueCents: 0,
},
});Pricing models
per_unitFixed revenue per run (e.g. $0.50 per solved ticket).per_resolutionRevenue only when outcome = 'resolved'.usage_basedRevenue ∝ token usage — computed per event.per_seatMonthly flat fee per workspace seat.flatFixed monthly subscription fee.hybridFlat base + per-unit overage.Dual-Database Architecture
Tollgate separates data processing into two purpose-built layers — the deliberate architectural choice that makes low-latency ingest coexist with finance-grade analytical querying.
Hot Path — DynamoDB
- ›Raw usage events at sub-10 ms latency
- ›Conditional writes on idempotency_key — once-only ledgering
- ›TTL-managed raw event retention
- ›DynamoDB Streams feeds background workers
- ›margin_rollups table for live dashboard reads
Cold Path — Aurora PostgreSQL
- ›Finance-grade immutable history
- ›Provider rate-card joins and cost computation
- ›Window queries for trend and anomaly detection
- ›Alert rule evaluation against daily rollups
- ›CSV export and audit trail
Why two databases?
Token Cost Economics
Reasoning / thinking models bill thinking tokens at output token rates— typically 3–10× the input rate. A single complex run can cost $0.53 against a $0.50 revenue plan. Standard monitoring won't surface this until the invoice lands.
Run A — FAQ Bot: +96.7% Margin
3,000 in / 500 out tokens · Cost: $0.0165 · Revenue: $0.50 · Profit: +$0.4835
Run B — Billing Dispute Bot: −6.2% Margin
40,000 in / 25,000 reasoning / 2,000 out tokens · Cost: $0.5310 · Revenue: $0.50 · Loss: −$0.031
Blended across both customers they appear ~46% profitable. Tollgate breaks the blend apart and flags Customer B — giving you time to reprice, add a reasoning cap, or move them to a usage-based tier.
Token categories Tollgate tracks
tokensIn (standard input) · tokensOut (output) · reasoningTokens (billed at output rate) · cachedTokens (cache-read, reduced rate) · cacheWrite5mTokens / cacheWrite1hTokens · audioTokensIn / audioTokensOut · imageTokensIn / imageTokensOut · videoTokensIn · webSearchRequestsPOST /api/track — Field Reference
All fields sent to POST https://www.tollgateai.dev/api/track. Header: Authorization: Bearer tg_live_… · Content-Type: application/json
| Field | Type | Description |
|---|---|---|
| customerId* | string | Your external customer ID. Attributes all costs and revenue to this tenant. |
| runId* | string | Groups all LLM calls for one end-to-end task or session. |
| provider* | string | anthropic · openai · openai_compatible · bedrock · google |
| model* | string | Model name as returned by the provider — used for rate-card lookup. |
| idempotencyKey* | string | Unique per event. Prevents double-counting on retries. Pattern: runId#stepN. |
| agentId | string | Which agent within the run — for per-agent cost breakdown. |
| type | string | llm (default) · tool · retrieval |
| tokensIn | int | Standard (non-cached) input tokens. |
| tokensOut | int | Output tokens. |
| reasoningTokens | int | Thinking/reasoning tokens — billed at output rate. Required for o1, o3, Claude 3.7+. |
| cachedTokens | int | Cache-read input tokens — billed at reduced rate. |
| cacheWrite5mTokens | int | Cache-write tokens with 5-minute TTL (Anthropic prompt caching). |
| cacheWrite1hTokens | int | Cache-write tokens with 1-hour TTL. |
| toolCalls | int | Number of tool calls in this LLM response. |
| toolName | string | Tool name for per-tool cost breakdown. |
| audioTokensIn | int | Audio input tokens (OpenAI Realtime API). |
| audioTokensOut | int | Audio output tokens. |
| imageTokensIn | int | Image/vision input tokens. |
| imageTokensOut | int | Image generation output tokens. |
| videoTokensIn | int | Video input tokens (Gemini). |
| webSearchRequests | int | Grounded web search calls (Anthropic / Gemini). Priced per request. |
| latencyMs | int | End-to-end request latency in milliseconds. |
| externalCostCents | float | Cost of external tools (image gen APIs, sandboxes) — added directly to run cost. |
| providerCostCents | float | Exact cost from provider/gateway — skips Tollgate's rate-card lookup. |
| outcome | string | resolved · escalated · failed — set only on the final closing event. |
| revenueUnitCents | int | Per-run revenue in cents. Overrides the customer's plan default. |
| ts | ISO string | Event timestamp. Defaults to server receive time if omitted. |
Privacy enforcement
prompt, messages, content, input, or output are rejected with HTTP 400. Never send prompt content to Tollgate.Error Codes
| Status | Meaning | Action | |
|---|---|---|---|
| 201 | created | Event ingested successfully. | None. |
| 200 | duplicate | Same idempotencyKey already stored. | Safe to ignore — no double-count. |
| 400 | bad_request | Validation error or prompt content detected. | Fix the payload — don't retry. |
| 401 | unauthorized | Invalid or missing API key. | Check TOLLGATE_API_KEY. |
| 402 | quota_exceeded | Monthly event quota reached. | Upgrade plan in Settings. |
| 429 | rate_limited | Rate limit exceeded. | Respect Retry-After header. |
| 500 | server_error | Internal error — event may not have been stored. | Retry with exponential back-off. |
Integration Checklist
TOLLGATE_API_KEY set in environment — never commit it.
Provider client wrapped (or tollgate.track() called after each LLM call).
customerId matches real customer IDs in your system.
runId consistently identifies one end-to-end task.
idempotencyKey is stable and unique per event (pattern: runId#stepN).
reasoningTokens included for extended thinking models (Claude 3.7+, o1, o3).
cachedTokens included if using prompt caching.
outcome set on the closing event for per_unit / hybrid / per_resolution plans.
stream_options: { include_usage: true } added for OpenAI / OpenAI-compatible streaming.
upsertCustomer() called before first usage event for usage_based pricing.
recommendedNo prompt content in any field.
Plans & Pricing
All plans are self-serve and processed via Razorpay subscription cards inside Settings.
| Tier | Price (INR) | Events / mo | Customers | Seats | Features |
|---|---|---|---|---|---|
| Starter | ₹0 / mo | 10,000 | 3 | 1 | Overview & Customers dashboards |
| Growth | ₹4,999 / mo | 500,000 | 50 | 5 | Email & webhook alerts, CSV exports |
| Scale | ₹14,999 / mo | 5,000,000 | Unlimited | Unlimited | SSO/OIDC, Priority support |
Events beyond your plan limit are rejected with 402 until you upgrade. Upgrading is instant — no re-integration required.
Security & Privacy
Zero Payload Retention
SDKs analyze token counts. We never inspect, record, or route agent prompt strings, document contents, or any message bodies. Fields named prompt, messages, content, input, or output are rejected at ingestion (HTTP 400).
AES-256-GCM Encryption at Rest
All client credentials and API keys are encrypted at rest using AES-256-GCM via KMS-managed keys. Strict per-tenant row isolation is enforced in both DynamoDB (partition key scoping) and PostgreSQL (RLS policies).
Idempotent Ingest
Every write to DynamoDB uses a conditional expression on the idempotency_key — meaning events are guaranteed to land exactly once even on network retries or duplicated webhook calls.
SDK Trust Model
The SDK never modifies the response your code receives, never blocks on Tollgate errors, and always passes through to the original provider client. Tollgate errors are logged locally but do not surface to your end users.