Demo: Computer Use

V2 only — invite-only edition. This is part of AI Partner V2 and is not in the open-source V1 you self-host from the Quick Start. V2 is available now, by invite. See V1 vs V2.

What you'll see

You give AI Partner a task that requires interacting with a website. It picks the right browser tier for the job — fast DOM extraction for standard pages, vision-based navigation for complex SPAs, a full containerized desktop for the hardest cases — and completes the task autonomously.

The four tiers

Tier	Method	Speed	Used for
T1	Playwright DOM extraction (no screenshot)	~2s	Standard pages with clean HTML
T2	Playwright + Vision (screenshot → LLM)	~10s	JS-heavy SPAs, dynamically rendered content
T3	Container Xvfb desktop + real browser	~30s	Meeting joins, CAPTCHA-protected flows, full desktop automation
T4	Host pyautogui — real keyboard + mouse	~2s/action	Native apps, your real logged-in sessions (allowlisted, single-user only)

The agent always starts at T1 and escalates automatically if a lower tier fails.

T4 is the only tier that drives your actual desktop. It's gated behind two security checks, runs with pyautogui.FAILSAFE=True (drag cursor to a corner to abort), watches the cursor for human interruptions, and writes every action to a persistent audit log. Read the full T4 Host Control guide before enabling it.

T3 persistent shell (`t3_shell`)

When partner mode is active, the agent's run_command tool no longer goes to a separate headless sandbox container. Instead it routes to a persistent bash session inside the T3 container — same container that owns the Chromium and the live frame stream.

Why this matters:

Before	After
`pip install nmap-python` in step 1 → step 2 can't import it (fresh container)	Step 2 sees the install (same shell session)
`cd /tmp/myproject` doesn't carry to the next command	Working directory persists across actions
Env vars set with `export` are lost	Exported vars survive
Restart partner → all installed packages gone	`/home/partner` is a per-user docker volume — packages survive container restart

How it works:

The partner runs in a full, isolated Linux environment with a persistent shell — bash, Python, Node.js, pip, npm, and internet access.
It keeps your working directory and environment between commands, so multi-step work flows naturally.
When a partner session is active, the agent runs commands inside that environment automatically — no special syntax needed.

This is the single biggest unlock for letting an expert model do specialty work. The model installs whatever it needs per task (nmap, nuclei, jupyter, cargo, anything in the apt / pip / npm / go install universe) and the install persists for as long as the partner session is up. Specialty domain containers are no longer required — one general T3 container can become a pentest workstation, a data-analysis notebook, or an agile-build environment depending on what the goal needs.

Demo 1: Standard web extraction (T1)

Type this:

Go to https://news.ycombinator.com and extract the top 10 stories.
For each story get: title, URL, points, and comment count.
Return them as a clean numbered list.

What happens:

✅ T1: browser_navigate(https://news.ycombinator.com)
✅ T1: browser_extract(selector: ".athing, .subtext")
   → Extracted 10 stories in 1.8 seconds
✅ Formatted and returned

T1 reads the DOM directly — no screenshot, no LLM vision — so it's extremely fast.

Type this:

Go to https://linear.app and navigate to the pricing page.
Extract all plan names, prices, and the features listed under each plan.

What happens:

Linear's pricing page is rendered by a React SPA with no clean static HTML. T1 fails to extract meaningful data. The agent escalates:

⚠️ T1: extraction returned empty data → escalating to T2
✅ T2: browser_navigate(https://linear.app/pricing) with stealth mode
✅ T2: browser_screenshot() → captured page
✅ T2: vision_analyze(screenshot) → "I can see 3 pricing tiers: Free, Business, Enterprise..."
✅ T2: browser_extract(targeted selectors based on visual analysis)
   → Extracted 3 plans, prices, and feature lists
✅ Formatted and returned

T2 takes a screenshot, uses the LLM's vision to understand the page layout, then extracts the data using the selectors it identifies visually.

Demo 3: CAPTCHA handling (T2 → user handoff)

Type this:

Go to https://www.linkedin.com/in/satya-nadella and extract his current job title,
company, location, and latest 3 posts.

LinkedIn aggressively blocks automated browsers. When the agent encounters a CAPTCHA or login wall:

✅ T2: browser_navigate(https://linkedin.com/in/satya-nadella)
⚠️ T2: CAPTCHA detected — pausing for human handoff

In the AI Partner UI, you'll see:

A live screenshot of the CAPTCHA page
A "Take Control" button

Click Take Control → a browser window opens on your machine → solve the CAPTCHA → click Continue in AI Partner.

✅ Resumed after CAPTCHA solved by user
✅ T2: browser_extract(profile data)
   → Title: CEO, Microsoft
   → Location: Redmond, WA
   → Latest posts extracted

The agent re-validates the current page state after the handoff before extracting data — it confirms it's actually on the profile page, not a post-CAPTCHA redirect.

Demo 4: Form filling (T1)

Type this:

Go to https://formspree.io/forms/new and fill in the form to create a new form endpoint.
Use these values:
- Form name: Test Form
- Email: test@example.com
Submit the form and tell me the endpoint URL that appears after submission.

What happens:

✅ T1: browser_navigate(https://formspree.io/forms/new)
✅ T1: browser_fill(selector: "#name", value: "Test Form")
✅ T1: browser_fill(selector: "#email", value: "test@example.com")
✅ T1: browser_click(selector: "button[type=submit]")
✅ T1: browser_extract(selector: ".endpoint-url")
   → Endpoint: https://formspree.io/f/xyzabc
✅ Result returned

Demo 5: Meeting join (T3 container)

The meeting attendance demo is the most advanced computer use case. The T3 tier boots a full containerized desktop (Xvfb + PulseAudio), joins the meeting via a real Chromium browser, and captures audio.

See the full walkthrough: Meeting Attendance demo →

Configuring computer use tiers

Go to Settings → Computer Use to configure:

Setting	Default	Notes
Default tier	Auto-escalate	Start at T1, escalate on failure
T4 host control	Disabled	Enable only for single-user, trusted environments — refused entirely when auth is on. See T4 guide
T4 max steps	20	Hard cap to prevent runaway host control
T4 allowlist	`chrome, firefox, code, terminal`	Active window must match before any T4 action (empty = all apps)
T4 denylist	password managers	Always blocked, even with an open allowlist
T4 approval mode	`sensitive`	`never` / `sensitive` / `always` — pause for approval before risky host actions
T4 media keys	enabled	`media_key` verb: volume, mute, play/pause, next/previous (bypasses allowlist)
T4 drift threshold	50 px	T4 pauses if you move your cursor between its actions
T4 monitor capture	`primary`	On multi-monitor systems, only the primary monitor is captured
T4 live broadcast	Off (`live_stream_fps: 0`)	Set to 5-10 fps to see a continuous video of your screen in the Inspector while T4 runs
CAPTCHA timeout	5 minutes	How long to wait for user to solve CAPTCHA
Stealth mode	On for T2	Rotates user-agent and headers to avoid detection

T4 emergency controls

Control	How to trigger
Hard abort (FAILSAFE)	Drag your mouse cursor to any screen corner
Kill-now (all tiers)	`POST /api/computer-use/stop` — wired to the UI's STOP button. Cancels the in-flight model call instantly and halts T1–T4 (not just T4). Also fires when you cancel the parent goal. No timeout involved — it stops only when you press it
Inspect audit log	`GET /api/computer-use/audit-tail?n=200` returns the last N actions with full args
Drift guard	Automatic — move the cursor mid-task and T4 pauses for you

What computer use can't do

Computer use is not magic. Some limitations:

Two-factor authentication: if a site requires 2FA, the agent pauses for HITL (you enter the code)
PDF downloads in containers: downloaded files are extracted from the container and saved to workspace
Paid sites: the agent can't pay for access it doesn't have
Sites with Cloudflare turnstile: T2 may solve simple CAPTCHAs, complex ones always escalate to you
Desktop apps (non-web): T4 only, and only for allowlisted applications

Combining with goal execution

Computer use is one tool among many in the ReAct loop. A single goal can mix browser, Python, and file generation:

Scrape the Stripe pricing page and extract all plan details.
Then scrape the Paddle pricing page and do the same.
Compare the two and generate a side-by-side Excel spreadsheet.

The agent handles this as:

T1: scrape Stripe pricing → Python: structure data
T1: scrape Paddle pricing → Python: structure data
generate_excel: create comparison table
Files panel: download link available

What you'll see​

The four tiers​

T3 persistent shell (t3_shell)​

Demo 1: Standard web extraction (T1)​

Demo 2: Complex SPA navigation (T2)​

Demo 3: CAPTCHA handling (T2 → user handoff)​

Demo 4: Form filling (T1)​

Demo 5: Meeting join (T3 container)​

Configuring computer use tiers​

T4 emergency controls​

What computer use can't do​

Combining with goal execution​

What you'll see

The four tiers

T3 persistent shell (`t3_shell`)

Demo 1: Standard web extraction (T1)

Demo 2: Complex SPA navigation (T2)

Demo 3: CAPTCHA handling (T2 → user handoff)

Demo 4: Form filling (T1)

Demo 5: Meeting join (T3 container)

Configuring computer use tiers

T4 emergency controls

What computer use can't do

Combining with goal execution