Skip to main content

Usage & Cost

What Usage & Cost shows

Every LLM call AI Partner makes is logged with its token counts and estimated cost. The Usage & Cost panel surfaces this as a dashboard so you can see exactly where your API spend is going and set alerts before it gets out of hand.

Go to sidebar → Usage & Cost.


Summary metrics

The top row shows totals for your selected period (daily / weekly / monthly):

MetricDescription
Total callsNumber of LLM API calls made
Prompt tokensTokens sent to the model (your context + tool results)
Completion tokensTokens generated by the model (responses)
Total tokensPrompt + completion
Total costEstimated USD cost across all providers
Avg latencyAverage response time per call

Switch between Daily, Weekly, and Monthly views using the period selector at the top.


Per-provider breakdown

A table showing usage and cost broken down by provider and model:

ProviderModelCallsPrompt tokensCompletion tokensCostAvg latency
Anthropicclaude-3-5-haiku-20241022142284,00047,000$0.381.2s
Anthropicclaude-sonnet-4-528112,00018,000$0.872.1s
Groqllama-3.3-70b-versatile89178,00031,000$0.040.4s
OpenRoutergoogle/gemini-2.0-flash1545,0009,000$0.010.8s

This breakdown tells you:

  • Which models are being used most
  • Which models cost the most per call
  • Where the smart router is sending which task types

Daily cost tracker

A separate card shows today's spend vs. your configured alert threshold:

Today's cost: $0.42
Alert at: $10.00 [Edit threshold]

████░░░░░░░░░░░░░░ 4.2% of daily budget

When daily cost exceeds the threshold, AI Partner:

  1. Sends a notification to your Telegram (if configured)
  2. Shows a warning banner in the UI
  3. Optionally pauses new goal executions until you acknowledge (configurable)

Setting a cost alert

Click Edit threshold next to the daily cost card, or go to Settings → Usage:

Daily cost alert: $10.00
Action on threshold:
○ Notify only
● Notify + pause new goals
○ Notify + pause all LLM calls

Cost estimates are based on each provider's published per-token pricing. Actual billing may differ slightly due to rounding, minimum charges, or promotional rates. Always check your provider dashboard for authoritative billing figures.


Cost-saving tips

Use fast models for classification

The model router sends classification tasks to Groq (Llama-3.3-70b at $0.05/M tokens) instead of Claude or GPT-4o. Check Settings → Model Routing to confirm this is configured.

Limit goal iterations

Lower GOAL_MAX_ITERATIONS in .env (default: 40). Most goals complete in 5–15 iterations. Setting 20 cuts potential runaway costs.

Use conversation compression

The conversation compressor summarizes history when it exceeds 30 messages — reducing the prompt tokens on every subsequent call. Configurable via CONVERSATION_COMPRESS_THRESHOLD.

Cache with Anthropic

If you're using Claude heavily, enable prompt caching in .env: ANTHROPIC_PROMPT_CACHING=true. Repeated system prompts (agent identity, workspace files) are cached at 90% discount.


API access to usage data

# Summary for a period
GET /api/usage/summary?period=weekly

# Per-provider breakdown
GET /api/usage/by-provider?period=monthly

# Today's daily cost
GET /api/usage/daily-cost

# Alert threshold
GET /api/usage/alert-threshold
PUT /api/usage/alert-threshold { "threshold": 15.00 }

Per-user attribution (multi-user)

Every LLM call is attributed to the user whose chat or goal triggered it. On multi-user deployments, admins get a per-user cost table — calls, tokens, cost, and average latency per member — in Admin Console → Usage & Cost (GET /api/admin/usage?period=daily|weekly|monthly). Calls recorded before attribution existed are matched to users via their goal or conversation; anything unmatchable shows as unattributed.