Wrap your LLM client
See per-customer profit.
One line of code wraps Anthropic, OpenAI, Bedrock, Google, Vercel AI, and Azure. TokenOps captures every event, reconciles your monthly vendor invoice, and breaks cost down by the customer who incurred it — so you actually know which accounts are profitable.
Tenant id + key pair in your inbox within a business day. No credit card.
Per-customer cost is the question your spreadsheet can't answer.
AI-native companies pay six vendors, get one invoice each, and have no shared schema for cost. That makes the most important question — is this customer profitable? — a research project every month.
The Anthropic invoice doesn't say which customer cost you what.
Vendor bills aggregate at the workspace level. By the time it lands, you've lost the join between spend and the customers who incurred it.
Effective rate ≠ list rate. You find out at month end.
Caching, batch, region, and committed-use discounts mean what you actually pay diverges from the price card on the docs. You don't see the gap until reconciliation.
Your CEO wants per-customer gross margin. By Friday.
One spreadsheet, six tabs of CSVs, two SQL joins, and a promise to ‘firm it up next month.’ This is most AI startups today.
One SDK. One reconciliation engine.
One MCP your agents already speak.
TokenOps slots in at the LLM client and stays out of your way. You wrap once, attribute by customer, and the rest — capture, normalize, reconcile, expose — happens for you.
Wrap any LLM client
One-line wrappers for Anthropic, OpenAI, Bedrock, Google, Vercel AI, and Azure. We never proxy your traffic — calls go directly to the vendor.
Reconcile invoice → events
Every event is priced against captured tokens. Monthly vendor invoices are diffed line by line so you see effective rate, drift, and savings per (vendor, period, customer).
Answer via MCP
A 30-tool MCP catalog plus 4 composite workflows. Connect Claude Desktop, Cursor, or Codex and ask finance questions in plain English — the agent reaches for the right tools.
Thirty tools, grouped into five families. Your agent picks the right one.
Connect TokenOps to Claude Desktop, Cursor, Codex, or any MCP-aware client. Read tools are scoped by tenant; write tools require a separate key. No surprises.
Four composite tools for the four finance questions you actually ask.
Each workflow stitches a dozen MCP reads into one structured answer — the kind of multi-step reasoning agents do well, served as a single tool so it's deterministic.
Build a per-customer P&L
Joins captured events to your revenue source of truth and outputs gross margin per customer for any period. Used as the seed for monthly board updates.
Reconcile vendor invoice
Diffs the month's invoice from any of the six vendors against captured events. Flags drift, missing line items, and quietly applied volume discounts.
Detect cost anomalies
Watches per-customer cost run-rate and alerts when a customer's daily spend moves more than 2σ from its 30-day baseline. Catches runaway loops before the invoice does.
Forecast a price change
Simulates a new $/seat or $/usage price against the last 90 days. Returns customers who'd churn at margin <0% and projected revenue lift.
Install. Wrap. See. Around five minutes.
Install the SDK
Drop @tokenops/sdk (or the Python package) into the service that holds your LLM clients. Edge runtimes welcome.
Wrap your LLM client
One line per vendor. Pass customerId on each call — that's the join key for per-customer cost.
See per-customer cost
Open the dashboard, hit /api/cost, or ask your agent. Cost lands within seconds; the monthly vendor invoice is reconciled when it arrives.
How is this different from Helicone, Langfuse, or Braintrust?
Those are LLM observability tools — they help you debug prompts, track latency, and grade outputs. TokenOps is a finance layer. We share the capture step, but the rest of the product is reconciliation, gross margin, and the MCP layer that lets an agent answer ‘is Acme profitable?’ Use both — they're complements, not substitutes.
Do you proxy our LLM calls?
No. The SDK wraps your client in your runtime. Calls go directly to Anthropic, OpenAI, Bedrock, Google, Vercel AI, or Azure. We capture metadata after the response returns — never inserted between you and the vendor.
What does the SDK actually capture?
Per request: model, input/output tokens, vendor-reported usage, latency, your customerId, and any tags you attach. Prompt content is opt-in and stays in your tenant — most teams leave it off.
How does invoice reconciliation work?
Each month you forward the vendor invoice (PDF or CSV) to TokenOps, or we pull it via the vendor's billing API where available. We line-item match it against captured events, surface the effective rate, and flag drift on each (vendor, model, period).
What models and vendors are supported?
Anthropic, OpenAI, Amazon Bedrock, Google AI, the Vercel AI SDK, and Azure OpenAI. Pricing is kept current for Claude, GPT-4/4o, Gemini, Llama on Bedrock, and the major open-weight models served by these clouds. New models usually land within a week.
Pricing?
Early access is free while we're shaping the product. Pricing will be usage-based with a generous free tier — TokenOps should pay for itself in the savings it surfaces on your first vendor reconciliation.
Stop guessing per-customer cost.
Drop your work email. We'll send a tenant id, an SDK key, and a thirty-second wiring guide within a business day.
- No proxy. Your LLM calls go direct to the vendor.
- Prompt content stays in your tenant by default.
- MCP layer ships with read-only keys by default.