Usage & Cost

What Usage & Cost shows

Every LLM call AI Partner makes is logged with its token counts and estimated cost. The Usage & Cost panel surfaces this as a dashboard so you can see exactly where your API spend is going and set alerts before it gets out of hand.

Go to sidebar → Usage & Cost.

Summary metrics

The top row shows totals for your selected period (daily / weekly / monthly):

Metric	Description
Total calls	Number of LLM API calls made
Prompt tokens	Tokens sent to the model (your context + tool results)
Completion tokens	Tokens generated by the model (responses)
Total tokens	Prompt + completion
Total cost	Estimated USD cost across all providers
Avg latency	Average response time per call

Switch between Daily, Weekly, and Monthly views using the period selector at the top.

Per-provider breakdown

A table showing usage and cost broken down by provider and model:

Provider	Model	Calls	Prompt tokens	Completion tokens	Cost	Avg latency
Anthropic	claude-3-5-haiku-20241022	142	284,000	47,000	$0.38	1.2s
Anthropic	claude-sonnet-4-5	28	112,000	18,000	$0.87	2.1s
Groq	llama-3.3-70b-versatile	89	178,000	31,000	$0.04	0.4s
OpenRouter	google/gemini-2.0-flash	15	45,000	9,000	$0.01	0.8s

This breakdown tells you:

Which models are being used most
Which models cost the most per call
Where the smart router is sending which task types

Daily cost tracker

A separate card shows today's spend vs. your configured alert threshold:

Today's cost:  $0.42
Alert at:      $10.00   [Edit threshold]

████░░░░░░░░░░░░░░  4.2% of daily budget

When daily cost exceeds the threshold, AI Partner:

Sends a notification to your Telegram (if configured)
Shows a warning banner in the UI
Optionally pauses new goal executions until you acknowledge (configurable)

Setting a cost alert

Click Edit threshold next to the daily cost card, or go to Settings → Usage:

Daily cost alert: $10.00
Action on threshold:
  ○ Notify only
  ● Notify + pause new goals
  ○ Notify + pause all LLM calls

Cost estimates are based on each provider's published per-token pricing. Actual billing may differ slightly due to rounding, minimum charges, or promotional rates. Always check your provider dashboard for authoritative billing figures.

Cost-saving tips

Use fast models for classification

The model router sends classification tasks to Groq (Llama-3.3-70b at $0.05/M tokens) instead of Claude or GPT-4o. Check Settings → Model Routing to confirm this is configured.

Limit goal iterations

Lower GOAL_MAX_ITERATIONS in .env (default: 40). Most goals complete in 5–15 iterations. Setting 20 cuts potential runaway costs.

Use conversation compression

The conversation compressor summarizes history when it exceeds 30 messages — reducing the prompt tokens on every subsequent call. Configurable via CONVERSATION_COMPRESS_THRESHOLD.

Cache with Anthropic

If you're using Claude heavily, enable prompt caching in .env: ANTHROPIC_PROMPT_CACHING=true. Repeated system prompts (agent identity, workspace files) are cached at 90% discount.

API access to usage data

# Summary for a period
GET /api/usage/summary?period=weekly

# Per-provider breakdown
GET /api/usage/by-provider?period=monthly

# Today's daily cost
GET /api/usage/daily-cost

# Alert threshold
GET /api/usage/alert-threshold
PUT /api/usage/alert-threshold  { "threshold": 15.00 }

Per-user attribution (multi-user)

Every LLM call is attributed to the user whose chat or goal triggered it. On multi-user deployments, admins get a per-user cost table — calls, tokens, cost, and average latency per member — in Admin Console → Usage & Cost (GET /api/admin/usage?period=daily|weekly|monthly). Calls recorded before attribution existed are matched to users via their goal or conversation; anything unmatchable shows as unattributed.

What Usage & Cost shows​

Summary metrics​

Per-provider breakdown​

Daily cost tracker​

Setting a cost alert​

Cost-saving tips​

API access to usage data​

Per-user attribution (multi-user)​