Usage & Cost
What Usage & Cost shows
Every LLM call AI Partner makes is logged with its token counts and estimated cost. The Usage & Cost panel surfaces this as a dashboard so you can see exactly where your API spend is going and set alerts before it gets out of hand.
Go to sidebar → Usage & Cost.
Summary metrics
The top row shows totals for your selected period (daily / weekly / monthly):
| Metric | Description |
|---|---|
| Total calls | Number of LLM API calls made |
| Prompt tokens | Tokens sent to the model (your context + tool results) |
| Completion tokens | Tokens generated by the model (responses) |
| Total tokens | Prompt + completion |
| Total cost | Estimated USD cost across all providers |
| Avg latency | Average response time per call |
Switch between Daily, Weekly, and Monthly views using the period selector at the top.
Per-provider breakdown
A table showing usage and cost broken down by provider and model:
| Provider | Model | Calls | Prompt tokens | Completion tokens | Cost | Avg latency |
|---|---|---|---|---|---|---|
| Anthropic | claude-3-5-haiku-20241022 | 142 | 284,000 | 47,000 | $0.38 | 1.2s |
| Anthropic | claude-sonnet-4-5 | 28 | 112,000 | 18,000 | $0.87 | 2.1s |
| Groq | llama-3.3-70b-versatile | 89 | 178,000 | 31,000 | $0.04 | 0.4s |
| OpenRouter | google/gemini-2.0-flash | 15 | 45,000 | 9,000 | $0.01 | 0.8s |
This breakdown tells you:
- Which models are being used most
- Which models cost the most per call
- Where the smart router is sending which task types
Daily cost tracker
A separate card shows today's spend vs. your configured alert threshold:
Today's cost: $0.42
Alert at: $10.00 [Edit threshold]
████░░░░░░░░░░░░░░ 4.2% of daily budget
When daily cost exceeds the threshold, AI Partner:
- Sends a notification to your Telegram (if configured)
- Shows a warning banner in the UI
- Optionally pauses new goal executions until you acknowledge (configurable)
Setting a cost alert
Click Edit threshold next to the daily cost card, or go to Settings → Usage:
Daily cost alert: $10.00
Action on threshold:
○ Notify only
● Notify + pause new goals
○ Notify + pause all LLM calls
Cost estimates are based on each provider's published per-token pricing. Actual billing may differ slightly due to rounding, minimum charges, or promotional rates. Always check your provider dashboard for authoritative billing figures.
Cost-saving tips
The model router sends classification tasks to Groq (Llama-3.3-70b at $0.05/M tokens) instead of Claude or GPT-4o. Check Settings → Model Routing to confirm this is configured.
Lower GOAL_MAX_ITERATIONS in .env (default: 40). Most goals complete in 5–15 iterations. Setting 20 cuts potential runaway costs.
The conversation compressor summarizes history when it exceeds 30 messages — reducing the prompt tokens on every subsequent call. Configurable via CONVERSATION_COMPRESS_THRESHOLD.
If you're using Claude heavily, enable prompt caching in .env: ANTHROPIC_PROMPT_CACHING=true. Repeated system prompts (agent identity, workspace files) are cached at 90% discount.
API access to usage data
# Summary for a period
GET /api/usage/summary?period=weekly
# Per-provider breakdown
GET /api/usage/by-provider?period=monthly
# Today's daily cost
GET /api/usage/daily-cost
# Alert threshold
GET /api/usage/alert-threshold
PUT /api/usage/alert-threshold { "threshold": 15.00 }
Per-user attribution (multi-user)
Every LLM call is attributed to the user whose chat or goal triggered it. On multi-user deployments, admins get a per-user cost table — calls, tokens, cost, and average latency per member — in Admin Console → Usage & Cost (GET /api/admin/usage?period=daily|weekly|monthly). Calls recorded before attribution existed are matched to users via their goal or conversation; anything unmatchable shows as unattributed.