Demo: Computer Use
V2 only — invite-only edition. This is part of AI Partner V2 and is not in the open-source V1 you self-host from the Quick Start. V2 is available now, by invite. See V1 vs V2.
What you'll see
You give AI Partner a task that requires interacting with a website. It picks the right browser tier for the job — fast DOM extraction for standard pages, vision-based navigation for complex SPAs, a full containerized desktop for the hardest cases — and completes the task autonomously.
The four tiers
| Tier | Method | Speed | Used for |
|---|---|---|---|
| T1 | Playwright DOM extraction (no screenshot) | ~2s | Standard pages with clean HTML |
| T2 | Playwright + Vision (screenshot → LLM) | ~10s | JS-heavy SPAs, dynamically rendered content |
| T3 | Container Xvfb desktop + real browser | ~30s | Meeting joins, CAPTCHA-protected flows, full desktop automation |
| T4 | Host pyautogui — real keyboard + mouse | ~2s/action | Native apps, your real logged-in sessions (allowlisted, single-user only) |
The agent always starts at T1 and escalates automatically if a lower tier fails.
T4 is the only tier that drives your actual desktop. It's gated behind two security checks, runs with pyautogui.FAILSAFE=True (drag cursor to a corner to abort), watches the cursor for human interruptions, and writes every action to a persistent audit log. Read the full T4 Host Control guide before enabling it.
T3 persistent shell (t3_shell)
When partner mode is active, the agent's run_command tool no longer goes to a separate headless sandbox container. Instead it routes to a persistent bash session inside the T3 container — same container that owns the Chromium and the live frame stream.
Why this matters:
| Before | After |
|---|---|
pip install nmap-python in step 1 → step 2 can't import it (fresh container) | Step 2 sees the install (same shell session) |
cd /tmp/myproject doesn't carry to the next command | Working directory persists across actions |
Env vars set with export are lost | Exported vars survive |
| Restart partner → all installed packages gone | /home/partner is a per-user docker volume — packages survive container restart |
How it works:
- The partner runs in a full, isolated Linux environment with a persistent shell — bash, Python, Node.js, pip, npm, and internet access.
- It keeps your working directory and environment between commands, so multi-step work flows naturally.
- When a partner session is active, the agent runs commands inside that environment automatically — no special syntax needed.
This is the single biggest unlock for letting an expert model do specialty work. The model installs whatever it needs per task (nmap, nuclei, jupyter, cargo, anything in the apt / pip / npm / go install universe) and the install persists for as long as the partner session is up. Specialty domain containers are no longer required — one general T3 container can become a pentest workstation, a data-analysis notebook, or an agile-build environment depending on what the goal needs.
Demo 1: Standard web extraction (T1)
Type this:
Go to https://news.ycombinator.com and extract the top 10 stories.
For each story get: title, URL, points, and comment count.
Return them as a clean numbered list.
What happens:
✅ T1: browser_navigate(https://news.ycombinator.com)
✅ T1: browser_extract(selector: ".athing, .subtext")
→ Extracted 10 stories in 1.8 seconds
✅ Formatted and returned
T1 reads the DOM directly — no screenshot, no LLM vision — so it's extremely fast.
Demo 2: Complex SPA navigation (T2)
Type this:
Go to https://linear.app and navigate to the pricing page.
Extract all plan names, prices, and the features listed under each plan.
What happens:
Linear's pricing page is rendered by a React SPA with no clean static HTML. T1 fails to extract meaningful data. The agent escalates:
⚠️ T1: extraction returned empty data → escalating to T2
✅ T2: browser_navigate(https://linear.app/pricing) with stealth mode
✅ T2: browser_screenshot() → captured page
✅ T2: vision_analyze(screenshot) → "I can see 3 pricing tiers: Free, Business, Enterprise..."
✅ T2: browser_extract(targeted selectors based on visual analysis)
→ Extracted 3 plans, prices, and feature lists
✅ Formatted and returned
T2 takes a screenshot, uses the LLM's vision to understand the page layout, then extracts the data using the selectors it identifies visually.
Demo 3: CAPTCHA handling (T2 → user handoff)
Type this:
Go to https://www.linkedin.com/in/satya-nadella and extract his current job title,
company, location, and latest 3 posts.
LinkedIn aggressively blocks automated browsers. When the agent encounters a CAPTCHA or login wall:
✅ T2: browser_navigate(https://linkedin.com/in/satya-nadella)
⚠️ T2: CAPTCHA detected — pausing for human handoff
In the AI Partner UI, you'll see:
- A live screenshot of the CAPTCHA page
- A "Take Control" button
Click Take Control → a browser window opens on your machine → solve the CAPTCHA → click Continue in AI Partner.
✅ Resumed after CAPTCHA solved by user
✅ T2: browser_extract(profile data)
→ Title: CEO, Microsoft
→ Location: Redmond, WA
→ Latest posts extracted
The agent re-validates the current page state after the handoff before extracting data — it confirms it's actually on the profile page, not a post-CAPTCHA redirect.
Demo 4: Form filling (T1)
Type this:
Go to https://formspree.io/forms/new and fill in the form to create a new form endpoint.
Use these values:
- Form name: Test Form
- Email: test@example.com
Submit the form and tell me the endpoint URL that appears after submission.
What happens:
✅ T1: browser_navigate(https://formspree.io/forms/new)
✅ T1: browser_fill(selector: "#name", value: "Test Form")
✅ T1: browser_fill(selector: "#email", value: "test@example.com")
✅ T1: browser_click(selector: "button[type=submit]")
✅ T1: browser_extract(selector: ".endpoint-url")
→ Endpoint: https://formspree.io/f/xyzabc
✅ Result returned
Demo 5: Meeting join (T3 container)
The meeting attendance demo is the most advanced computer use case. The T3 tier boots a full containerized desktop (Xvfb + PulseAudio), joins the meeting via a real Chromium browser, and captures audio.
See the full walkthrough: Meeting Attendance demo →
Configuring computer use tiers
Go to Settings → Computer Use to configure:
| Setting | Default | Notes |
|---|---|---|
| Default tier | Auto-escalate | Start at T1, escalate on failure |
| T4 host control | Disabled | Enable only for single-user, trusted environments — refused entirely when auth is on. See T4 guide |
| T4 max steps | 20 | Hard cap to prevent runaway host control |
| T4 allowlist | chrome, firefox, code, terminal | Active window must match before any T4 action (empty = all apps) |
| T4 denylist | password managers | Always blocked, even with an open allowlist |
| T4 approval mode | sensitive | never / sensitive / always — pause for approval before risky host actions |
| T4 media keys | enabled | media_key verb: volume, mute, play/pause, next/previous (bypasses allowlist) |
| T4 drift threshold | 50 px | T4 pauses if you move your cursor between its actions |
| T4 monitor capture | primary | On multi-monitor systems, only the primary monitor is captured |
| T4 live broadcast | Off (live_stream_fps: 0) | Set to 5-10 fps to see a continuous video of your screen in the Inspector while T4 runs |
| CAPTCHA timeout | 5 minutes | How long to wait for user to solve CAPTCHA |
| Stealth mode | On for T2 | Rotates user-agent and headers to avoid detection |
T4 emergency controls
| Control | How to trigger |
|---|---|
| Hard abort (FAILSAFE) | Drag your mouse cursor to any screen corner |
| Kill-now (all tiers) | POST /api/computer-use/stop — wired to the UI's STOP button. Cancels the in-flight model call instantly and halts T1–T4 (not just T4). Also fires when you cancel the parent goal. No timeout involved — it stops only when you press it |
| Inspect audit log | GET /api/computer-use/audit-tail?n=200 returns the last N actions with full args |
| Drift guard | Automatic — move the cursor mid-task and T4 pauses for you |
What computer use can't do
Computer use is not magic. Some limitations:
- Two-factor authentication: if a site requires 2FA, the agent pauses for HITL (you enter the code)
- PDF downloads in containers: downloaded files are extracted from the container and saved to workspace
- Paid sites: the agent can't pay for access it doesn't have
- Sites with Cloudflare turnstile: T2 may solve simple CAPTCHAs, complex ones always escalate to you
- Desktop apps (non-web): T4 only, and only for allowlisted applications
Combining with goal execution
Computer use is one tool among many in the ReAct loop. A single goal can mix browser, Python, and file generation:
Scrape the Stripe pricing page and extract all plan details.
Then scrape the Paddle pricing page and do the same.
Compare the two and generate a side-by-side Excel spreadsheet.
The agent handles this as:
- T1: scrape Stripe pricing → Python: structure data
- T1: scrape Paddle pricing → Python: structure data
- generate_excel: create comparison table
- Files panel: download link available