Skip to main content

Demo: Computer Use

V2 only — invite-only edition. This is part of AI Partner V2 and is not in the open-source V1 you self-host from the Quick Start. V2 is available now, by invite. See V1 vs V2.

What you'll see

You give AI Partner a task that requires interacting with a website. It picks the right browser tier for the job — fast DOM extraction for standard pages, vision-based navigation for complex SPAs, a full containerized desktop for the hardest cases — and completes the task autonomously.


The four tiers

TierMethodSpeedUsed for
T1Playwright DOM extraction (no screenshot)~2sStandard pages with clean HTML
T2Playwright + Vision (screenshot → LLM)~10sJS-heavy SPAs, dynamically rendered content
T3Container Xvfb desktop + real browser~30sMeeting joins, CAPTCHA-protected flows, full desktop automation
T4Host pyautogui — real keyboard + mouse~2s/actionNative apps, your real logged-in sessions (allowlisted, single-user only)

The agent always starts at T1 and escalates automatically if a lower tier fails.

T4 is the only tier that drives your actual desktop. It's gated behind two security checks, runs with pyautogui.FAILSAFE=True (drag cursor to a corner to abort), watches the cursor for human interruptions, and writes every action to a persistent audit log. Read the full T4 Host Control guide before enabling it.


T3 persistent shell (t3_shell)

When partner mode is active, the agent's run_command tool no longer goes to a separate headless sandbox container. Instead it routes to a persistent bash session inside the T3 container — same container that owns the Chromium and the live frame stream.

Why this matters:

BeforeAfter
pip install nmap-python in step 1 → step 2 can't import it (fresh container)Step 2 sees the install (same shell session)
cd /tmp/myproject doesn't carry to the next commandWorking directory persists across actions
Env vars set with export are lostExported vars survive
Restart partner → all installed packages gone/home/partner is a per-user docker volume — packages survive container restart

How it works:

  • The partner runs in a full, isolated Linux environment with a persistent shell — bash, Python, Node.js, pip, npm, and internet access.
  • It keeps your working directory and environment between commands, so multi-step work flows naturally.
  • When a partner session is active, the agent runs commands inside that environment automatically — no special syntax needed.

This is the single biggest unlock for letting an expert model do specialty work. The model installs whatever it needs per task (nmap, nuclei, jupyter, cargo, anything in the apt / pip / npm / go install universe) and the install persists for as long as the partner session is up. Specialty domain containers are no longer required — one general T3 container can become a pentest workstation, a data-analysis notebook, or an agile-build environment depending on what the goal needs.


Demo 1: Standard web extraction (T1)

Type this:

Go to https://news.ycombinator.com and extract the top 10 stories.
For each story get: title, URL, points, and comment count.
Return them as a clean numbered list.

What happens:

✅ T1: browser_navigate(https://news.ycombinator.com)
✅ T1: browser_extract(selector: ".athing, .subtext")
→ Extracted 10 stories in 1.8 seconds
✅ Formatted and returned

T1 reads the DOM directly — no screenshot, no LLM vision — so it's extremely fast.


Demo 2: Complex SPA navigation (T2)

Type this:

Go to https://linear.app and navigate to the pricing page.
Extract all plan names, prices, and the features listed under each plan.

What happens:

Linear's pricing page is rendered by a React SPA with no clean static HTML. T1 fails to extract meaningful data. The agent escalates:

⚠️ T1: extraction returned empty data → escalating to T2
✅ T2: browser_navigate(https://linear.app/pricing) with stealth mode
✅ T2: browser_screenshot() → captured page
✅ T2: vision_analyze(screenshot) → "I can see 3 pricing tiers: Free, Business, Enterprise..."
✅ T2: browser_extract(targeted selectors based on visual analysis)
→ Extracted 3 plans, prices, and feature lists
✅ Formatted and returned

T2 takes a screenshot, uses the LLM's vision to understand the page layout, then extracts the data using the selectors it identifies visually.


Demo 3: CAPTCHA handling (T2 → user handoff)

Type this:

Go to https://www.linkedin.com/in/satya-nadella and extract his current job title,
company, location, and latest 3 posts.

LinkedIn aggressively blocks automated browsers. When the agent encounters a CAPTCHA or login wall:

✅ T2: browser_navigate(https://linkedin.com/in/satya-nadella)
⚠️ T2: CAPTCHA detected — pausing for human handoff

In the AI Partner UI, you'll see:

  • A live screenshot of the CAPTCHA page
  • A "Take Control" button

Click Take Control → a browser window opens on your machine → solve the CAPTCHA → click Continue in AI Partner.

✅ Resumed after CAPTCHA solved by user
✅ T2: browser_extract(profile data)
→ Title: CEO, Microsoft
→ Location: Redmond, WA
→ Latest posts extracted

The agent re-validates the current page state after the handoff before extracting data — it confirms it's actually on the profile page, not a post-CAPTCHA redirect.


Demo 4: Form filling (T1)

Type this:

Go to https://formspree.io/forms/new and fill in the form to create a new form endpoint.
Use these values:
- Form name: Test Form
- Email: test@example.com
Submit the form and tell me the endpoint URL that appears after submission.

What happens:

✅ T1: browser_navigate(https://formspree.io/forms/new)
✅ T1: browser_fill(selector: "#name", value: "Test Form")
✅ T1: browser_fill(selector: "#email", value: "test@example.com")
✅ T1: browser_click(selector: "button[type=submit]")
✅ T1: browser_extract(selector: ".endpoint-url")
→ Endpoint: https://formspree.io/f/xyzabc
✅ Result returned

Demo 5: Meeting join (T3 container)

The meeting attendance demo is the most advanced computer use case. The T3 tier boots a full containerized desktop (Xvfb + PulseAudio), joins the meeting via a real Chromium browser, and captures audio.

See the full walkthrough: Meeting Attendance demo →


Configuring computer use tiers

Go to Settings → Computer Use to configure:

SettingDefaultNotes
Default tierAuto-escalateStart at T1, escalate on failure
T4 host controlDisabledEnable only for single-user, trusted environments — refused entirely when auth is on. See T4 guide
T4 max steps20Hard cap to prevent runaway host control
T4 allowlistchrome, firefox, code, terminalActive window must match before any T4 action (empty = all apps)
T4 denylistpassword managersAlways blocked, even with an open allowlist
T4 approval modesensitivenever / sensitive / always — pause for approval before risky host actions
T4 media keysenabledmedia_key verb: volume, mute, play/pause, next/previous (bypasses allowlist)
T4 drift threshold50 pxT4 pauses if you move your cursor between its actions
T4 monitor captureprimaryOn multi-monitor systems, only the primary monitor is captured
T4 live broadcastOff (live_stream_fps: 0)Set to 5-10 fps to see a continuous video of your screen in the Inspector while T4 runs
CAPTCHA timeout5 minutesHow long to wait for user to solve CAPTCHA
Stealth modeOn for T2Rotates user-agent and headers to avoid detection

T4 emergency controls

ControlHow to trigger
Hard abort (FAILSAFE)Drag your mouse cursor to any screen corner
Kill-now (all tiers)POST /api/computer-use/stop — wired to the UI's STOP button. Cancels the in-flight model call instantly and halts T1–T4 (not just T4). Also fires when you cancel the parent goal. No timeout involved — it stops only when you press it
Inspect audit logGET /api/computer-use/audit-tail?n=200 returns the last N actions with full args
Drift guardAutomatic — move the cursor mid-task and T4 pauses for you

What computer use can't do

Computer use is not magic. Some limitations:

  • Two-factor authentication: if a site requires 2FA, the agent pauses for HITL (you enter the code)
  • PDF downloads in containers: downloaded files are extracted from the container and saved to workspace
  • Paid sites: the agent can't pay for access it doesn't have
  • Sites with Cloudflare turnstile: T2 may solve simple CAPTCHAs, complex ones always escalate to you
  • Desktop apps (non-web): T4 only, and only for allowlisted applications

Combining with goal execution

Computer use is one tool among many in the ReAct loop. A single goal can mix browser, Python, and file generation:

Scrape the Stripe pricing page and extract all plan details.
Then scrape the Paddle pricing page and do the same.
Compare the two and generate a side-by-side Excel spreadsheet.

The agent handles this as:

  1. T1: scrape Stripe pricing → Python: structure data
  2. T1: scrape Paddle pricing → Python: structure data
  3. generate_excel: create comparison table
  4. Files panel: download link available