Token limits and budgets are how you keep AI spend predictable and aligned with the value AI is producing. Set caps, get alerted, and stay in control.Documentation Index
Fetch the complete documentation index at: https://help.pantaos.com/llms.txt
Use this file to discover all available pages before exploring further.
The hierarchy
- Workspace — the ceiling for the whole organization
- Team — slices for each department
- User — caps for individuals (rarely needed)
- Assistant — caps on specific assistants that are inherently expensive
Setting a budget
Enforcement modes
Soft cap
Soft cap
Alert when budget is exceeded; do not block. Useful for early rollouts where you want to learn before enforcing.
Hard cap
Hard cap
Block AI calls that would exceed the budget. Users see a clear message and can escalate to their admin.
Throttled cap
Throttled cap
Allow operations but downgrade to cheaper models when over budget. Good middle ground.
Temporary overrides
Campaign boost
A short-lived raise during a launch or quarterly push. Set an end date.
Project allocation
A dedicated budget for a project, shared by its members regardless of team.
Power user exemption
Lift a single user’s cap when they have a temporary high-volume need.
Workspace surge
Bump the workspace cap for a month. Comes with a clear audit trail.
Spending tips
Default to mid-tier models
Most chats don’t need the biggest model. Reserve the top tier for assistants that need it.
Trim system prompts
Long prompts are charged on every turn. Tighten them.
Cap expensive assistants
Set per-assistant caps so heavy users don’t drain the team budget.
Review monthly
A 30-minute monthly review is enough to keep spend on track.
