AI is an operating cost. Token Management is how you keep that cost predictable, attributable, and aligned with the value AI is producing.Documentation Index
Fetch the complete documentation index at: https://help.pantaos.com/llms.txt
Use this file to discover all available pages before exploring further.
What you get
Real-time usage tracking
Live dashboards showing token spend by user, team, assistant, and workflow.
Budgets at every level
Caps on the workspace, per team, per user, per assistant — independently set and stacked.
Smart alerts
Notifications when a budget hits 50%, 80%, 100%. Slack, email, or in-app.
Attribution by default
Every token spent is attributed to a person, an assistant, and a project. No mystery bills.
How budgets work
Set the workspace budget
Workspace-level cap is the ceiling. Nothing goes above it without an explicit raise.
Per-assistant guardrails
Expensive assistants (deep models, long context) get their own caps so they can’t drain a team’s budget.
What you see in analytics
Spend by team
Which department is using AI most. Often surprising — and useful.
Spend by assistant
The expensive assistants. The ones to optimize first.
Spend by user
Power users — usually your champions. Adoption signals.
Spend over time
Trends and anomalies, week over week.
Token Management is for organizational control. Individual users don’t see other users’ spend; they see their own and their team’s, depending on role.
Tips that save real money
Right-size your models
Right-size your models
Most chats don’t need the biggest model. Default to a fast, capable middle-tier and reserve the top for assistants that need it.
Trim system prompts
Trim system prompts
Long-winded system prompts get charged on every turn. Tighten them.
Cache the obvious
Cache the obvious
For repeat questions on stable knowledge, cache. PANTA OS caches transparently when it can.
Retire unused assistants
Retire unused assistants
Assistants no one runs still cost storage and indexing. Archive what’s stale.
