T4 Host Control
V2 only — invite-only edition. This is part of AI Partner V2 and is not in the open-source V1 you self-host from the Quick Start. V2 is available now, by invite. See V1 vs V2.
T4 is the only computer-use tier that drives your real mouse and keyboard. Everything else (T1, T2, T3) is sandboxed inside a browser context or a Docker container — if something goes wrong, the worst case is closing a tab or restarting a container. T4 doesn't have that escape hatch: every click happens on your actual desktop, every keystroke types into whichever window is focused.
In return, T4 is the only tier that can drive native applications: Office, Slack desktop, IDEs, file managers, anything else you run locally.
T4 is single-user, opt-in only. Do not enable it on a server you share with anyone else. Two security gates must both be open before T4 will run — see Enabling T4 below. As a hard guarantee, T4 is refused entirely when authentication is enabled (AI_PARTNER_AUTH_ENABLED=true): a multi-tenant deployment can never drive the host desktop, regardless of config. Tenants get an isolated desktop via T3 instead.
When to use T4 vs. T3
| T3 (Container Desktop) | T4 (Host Control) | |
|---|---|---|
| Isolation | Full — runs in Docker | None — your real machine |
| Recoverable mistakes | Yes (kill container) | No (real keystrokes) |
| Can control native Windows/macOS/Linux apps | No (Linux in container only) | Yes |
| Can use your real logged-in accounts | No (separate profile) | Yes |
| Live screen broadcast | Yes (default) | Optional (off by default) |
| Recommended for multi-user | Yes | No |
| First-task latency | ~20s (container spin-up) | ~2s (Python driver spawn) |
Default to T3 wherever possible. Use T4 only when the task requires a native app or your real session state.
The fundamental constraint
T4 and you share one mouse and one keyboard. You cannot both drive the same machine at the same time. This is physics, not a software design choice. Every other behavior in this page follows from it.
Enabling T4 in config does not mean T4 is constantly running. It means: the system is allowed to use T4 if a specific goal asks for it. T4 only takes over your input devices during an active T4 attempt. Outside of that, your machine is yours.
| State | Who controls mouse/keyboard | Your experience |
|---|---|---|
| T4 enabled, idle (no active task) | You | Normal — T4 might as well not exist |
| T4 enabled, attempt running | T4 (you should step back) | Cursor moves on its own, windows get clicked, text gets typed |
| T4 enabled, attempt paused (drift / stop / handoff) | You | T4 is waiting for you |
Deployment scenarios — how you see it and supervise
T4 can be deployed two different ways. The supervision experience is very different in each. Decide which one you're using before configuring live_stream_fps.
Scenario A — AI Partner runs on your own machine
You are physically sitting at the machine T4 is driving. The AI Partner UI is open in a browser window on the same desktop.
┌─────────────────────────────────────────────────────┐
│ YOUR LAPTOP │
│ │
│ ┌───────────────────────┐ ┌────────────────────┐ │
│ │ AI Partner UI │ │ T4's target apps │ │
│ │ (a browser window) │ │ (LibreOffice, │ │
│ │ │ │ Slack, terminal, │ │
│ │ • Chat │ │ whatever T4 is │ │
│ │ • Inspector ◀────┐ │ │ clicking) │ │
│ │ • STOP button │ │ │ │ │
│ └──────────────────┼────┘ └────────────────────┘ │
│ │ │
│ │ Live broadcast mirrors │
│ │ the same screen you're │
│ │ already looking at │
└─────────────────────────────────────────────────────┘
- You see T4's actions directly — its real mouse is your real mouse moving.
- The live broadcast in the Inspector panel is a mirror of your own screen. Useful only if your AI Partner window covers the action area.
- Practical advice: put the AI Partner window on a second monitor, or shrink it to a corner, so it doesn't block T4's target apps.
- Set
live_stream_fps: 0(off) — there's nothing to broadcast; the live screen IS your screen.
Scenario B — AI Partner runs on a remote workstation / home server
You are sitting at a different device (laptop, phone, tablet) and connecting to AI Partner over the network. T4 runs on the remote host.
┌──────────────────────┐ ┌──────────────────────────┐
│ YOUR PHONE / LAPTOP │ │ REMOTE WORKSTATION │
│ (the viewer) │ │ (the host T4 controls) │
│ │ │ │
│ AI Partner UI │ ◀─ socket.io ─▶ │ T4 driver + apps │
│ • Inspector shows ──┼─ live frames ────┤ │
│ remote screen │ │ (no one is using these │
│ • STOP button ────┼─ HTTP POST ──────▶ input devices) │
│ │ │ │
└──────────────────────┘ └──────────────────────────┘
- Two physical machines: the one you're sitting at, and the one T4 is driving.
- The live broadcast is your only window into what's happening. Without it you're blind.
- Set
live_stream_fps: 5(or higher) — the Inspector panel becomes a screen-share of the remote machine. - STOP button, audit log, drift guard — all reach the remote machine over the network.
UI control surface during an active T4 attempt
Whichever scenario you're in, the controls available to you are the same. They're layered intentionally — different mechanisms for different failure modes.
| Control | What it does | Where | Best for |
|---|---|---|---|
| Inspector panel | Live frames at the configured fps | AI Partner UI | Watching what T4 is doing |
| Activity log | Step-by-step text trace | AI Partner UI (goal-progress stream) | Reading after the fact |
| STOP button | Kill-now — POST /api/computer-use/stop. Aborts the in-flight model call instantly (via AbortSignal) and halts the tier chain. Works across all tiers (T1–T4), and also fires automatically when you cancel the parent goal | AI Partner UI | "Wait, that's wrong" while UI is reachable |
| Audit log | Durable JSONL of every action with timestamps + args | GET /api/computer-use/audit-tail or ~/.mindful-assistant/logs/t4-audit.log | Post-mortem and compliance |
| FAILSAFE corner | Hard abort — drag cursor to any screen corner | Physical, no UI needed | Panic button when UI is frozen / network is down / agent is stuck |
| Drift guard | Touch the mouse — T4 auto-pauses up to 30s | Physical | "Let me check something quickly" |
request_human_help | Agent itself pauses for login / CAPTCHA / manual review | Triggered by the agent | When the agent hits a wall |
The four stop mechanisms in order of severity:
- Drift guard — soft escape, T4 waits for you
- STOP button — cooperative, current action completes, no more start
- FAILSAFE corner — hard abort, works even if UI froze
request_human_help— agent's own escape hatch when it knows it's stuck
"But I'm working on the desktop where T4 is enabled — what then?"
This confuses people, so it's worth being explicit about it.
Case 1 — T4 is enabled in config, but no T4 task is running right now. Nothing happens. Use your machine normally. T4 doesn't sit there holding your input devices; it only acts during a specific goal that opts into host control.
Case 2 — A T4 task IS running. You should not use the machine concurrently. Two things will happen if you do:
- Your physical input collides with T4's (typing fights, click fights, focus fights)
- The drift guard detects your cursor moved → T4 pauses → asks you whether to back off or keep going
This is the honest tradeoff: T4 is for "I'll start a task and step back" or "I'll watch over screen-share from another device." It is not designed for "I'll keep working while it works."
If you want to keep working while the agent does something in parallel, that's what T3 is for — sandboxed Linux desktop inside Docker, doesn't touch your real mouse or keyboard. Or T1/T2 for headless browser work.
Picking the right tier for your situation
| What you want | Tier |
|---|---|
| "I'll keep working while the agent does something in parallel" | T3 (container desktop) |
| "Agent needs to use my real Slack / Outlook / IDE session — I'll step away" | T4, same machine, step back |
| "I'm on my phone or another room and want to launch a task on my desktop" | T4 with live_stream_fps > 0, second device for supervision |
| "I want to approve each sensitive action before it executes" | Set approval_mode: "sensitive" (or "always") — see Open-allowlist safety |
Safety model
T4 has four independent safety layers. Each catches a different class of mistake.
1. Double security gate
T4 will not run unless both of these are true:
// config.json
{
"computer_use": {
"tiers": {
"t4_host_control": {
"enabled": true // ← Gate 1 (config)
}
}
}
}
// per-call (set by the goal metadata or by the orchestrator)
{
"allowHostControl": true // ← Gate 2 (runtime)
}
Neither alone is sufficient. A misconfigured config.json cannot enable T4 globally — every call still has to explicitly request host control. A goal asking for host control is still refused if the config has T4 disabled.
2. FAILSAFE corner
pyautogui.FAILSAFE = True. Drag your cursor to any screen corner and the next T4 action aborts with a FAILSAFE error. This is your physical emergency stop — no UI, no network, no software dependency. Use it if T4 is doing something unexpected.
3. Cooperative stop signal + audit endpoint
POST /api/computer-use/stop
Sets a flag the driver checks before every action. The current action completes (or hits FAILSAFE) and no further actions run until a new attempt starts. Wire this to a "STOP" button in your UI.
The full audit log is available at:
GET /api/computer-use/audit-tail?n=200
Returns the last N entries (oldest first) along with the log-file path. Every T4 action — successful or not — appears here with timestamp, session id, action, args, allowlist result, and any error. Stored at <appDataDir>/logs/t4-audit.log (default ~/.mindful-assistant/logs/), rotated at 10 MB, keeps the last 5 rotations.
4. User-input drift guard
T4 remembers where it left your cursor after each action. Before the next action, it samples the cursor position again. If you've moved the cursor by more than user_drift_threshold_px (default 50px), T4 pauses for up to 30 seconds waiting for you to stop interacting, then aborts the attempt if you don't.
Translation: if you grab your mouse mid-task, T4 backs off.
Set user_drift_threshold_px: 0 in config to disable the drift guard (not recommended).
Allowlist
computer_use.tiers.t4_host_control.allowlist restricts T4 to clicking and typing only when the active window matches one of the listed application names (case-insensitive partial match).
{
"allowlist": ["chrome", "firefox", "code", "terminal", "libreoffice"]
}
An empty list permits all windows. Setting at least one entry hardens T4 considerably — if a popup steals focus mid-task, T4 stops instead of typing your password into a random dialog.
Every disallowed action is recorded in the audit log with allowlistResult: "blocked".
Open-allowlist safety: denylist + per-action approval
The allowlist is a positive scope ("only these apps"). To let T4 work on any app while still staying safe, leave the allowlist empty and rely on two compensating controls instead.
Denylist — always blocked
computer_use.tiers.t4_host_control.denylist is the inverse of the allowlist: window-title patterns that are always refused, even when the allowlist is open. It defaults to common password managers so the agent can never automate them.
{
"denylist": ["1password", "keepass", "bitwarden", "lastpass", "dashlane"]
}
A denylist hit is recorded in the audit log and the action throws — it beats an open allowlist.
Per-action approval — approval_mode
Before a sensitive action runs, T4 can pause and ask you. The policy is set by approval_mode:
| Mode | Behavior |
|---|---|
never | No approval. Allowlist + denylist still apply. Full autonomy. |
sensitive (default) | Approve only risky actions or sensitive windows — classifyT4Action flags shell-command text (rm -rf, curl | bash), run-dialog hotkeys, and sensitive window titles (terminals, banking). |
always | Approve every action. Maximum oversight, slowest. |
{
"approval_mode": "sensitive",
"sensitive_patterns": ["terminal", "powershell", "bank", "paypal", "wallet"]
}
Fail-safe: when an action needs approval but no approver is connected, it is blocked + reported — never silently allowed.
The approval prompt (Inspector)
When a sensitive action fires, a compact approval bar docks into the Inspector — it reuses the live screen you're already watching and draws a pulsing marker at the exact target, labelled with the monitor in monitor_capture: "all" mode. You get Approve / Deny / Stop plus a countdown; it auto-denies at execution.hitl_timeout_seconds (default 60s).
┌─ Inspector ───────────────────────────────┐
│ [ live screen ] ◎ ← target marker │
├────────────────────────────────────────────┤
│ ⚠ type "sudo rm -rf" → terminal ⏱0:54 │
│ [ ✓ Approve ] [ ✗ Deny ] [ ⛔ Stop ] │
└────────────────────────────────────────────┘
Media keys bypass the gate
media_key (volume/playback) is a global, low-risk control, so it is not subject to the allowlist, denylist, or approval gate — but it is still honoured by the STOP signal and written to the audit log.
Live screen broadcast
By default T4 runs without a live view — the agent takes a screenshot per step, but you don't see what's happening between steps. Enable the live broadcast and you get a continuous video feed of your screen rendered in the same Inspector panel where T1/T2/T3 frames already appear.
Where you'll see it
The Inspector panel in the AI Partner UI — the same panel that shows browser-tier (T1/T2) and partner-container (T3) frames. When T4 is running with the live stream enabled, frames arrive on the existing browser:screenshot socket channel with source: "t4". The Inspector renders them in the same view without any panel switching.
You also receive a separate state event so the UI can show a "T4 LIVE" indicator:
socket.on('t4:stream_state', ({ active, sessionId }) => {
// active=true when an attempt with live_stream_fps > 0 begins
// active=false when the attempt ends
})
How to enable
{
"computer_use": {
"tiers": {
"t4_host_control": {
"enabled": true,
"live_stream_fps": 5, // 0 = off; 5 = 5fps (low CPU); 10 = smoother
"live_stream_quality": 60 // JPEG quality 1-100; lower = less bandwidth
}
}
}
}
At 5 fps and JPEG quality 60, expect roughly 50-150 KB/frame depending on screen complexity — well within a local socket's capacity, and negligible CPU overhead because the screenshot path is shared with the agent's vision pipeline.
Why use a video feed if T4 takes its own screenshots?
Two different jobs:
- Agent screenshots happen at decision points, sampled when the LLM is about to choose its next action. The agent only needs one frame per step.
- Live broadcast happens continuously, sampled by the UI so you can supervise. Animations, transitions, focus changes, hover states — all visible to you, not necessarily to the agent.
The live feed is for you, not the agent. Watch it. The combination of live feed + FAILSAFE corner + drift guard is what makes T4 feel like supervising a colleague rather than launching a script and hoping for the best.
Multi-monitor handling
By default T4 captures and acts on the primary monitor only. On a multi-monitor setup the vision model sees a single coherent screenshot of the primary screen (in its local (0, 0)-origin coordinate space), and clicks land where the model expects. The coordinate offset from primary-monitor origin to virtual-desktop origin is applied transparently inside the Python driver.
Override:
{
"monitor_capture": "all" // capture the full virtual desktop
}
Use "all" only if you need T4 to operate across monitors in a single task. Most vision models perform worse on very wide aspect ratios (e.g. a 5760×1080 screenshot of three monitors side-by-side), so primary-only is the better default.
The Python screeninfo library is required for monitor enumeration. If not installed, T4 falls back to single-monitor mode.
Action vocabulary
T4 implements the full action set the vision model can choose from. Compared to the prior implementation, this pass added hold_key, triple_click, cursor_position, middle_click, and wait to bring parity with Anthropic's Computer Use surface.
| Action | Purpose |
|---|---|
screenshot | Capture current screen for the model |
left_click | Click at coordinate |
double_click | Double click at coordinate |
triple_click | Triple click (select line/paragraph) |
right_click | Right click (context menu) |
middle_click | Middle click (paste / open in new tab) |
type | Type a string of text |
key | Press a key or chord (Enter, ctrl+s, Escape, ...) |
hold_key | Hold a key for N ms (sustained modifier, gaming, etc.) |
media_key | Volume up/down, mute, play/pause, next/previous track (host-only; bypasses the allowlist as a global low-risk control) |
cursor_position | Query current cursor coordinates (also rebaselines drift guard) |
scroll | Wheel scroll at coordinate, up/down |
hover | Move cursor without clicking |
drag | Click-and-drag from from to to |
wait | Sleep N ms |
request_human_help | Pause for HITL (login wall / CAPTCHA / manual review) |
task_complete | End successfully with result string |
task_failed | End with failure reason |
Enabling T4
T4 is off by default. To turn it on:
1. Install Python dependencies
pip install pyautogui pillow screeninfo
# Windows additionally:
pip install pywin32 psutil
# macOS additionally — also grant Accessibility permission to your terminal
2. Edit config.json
{
"computer_use": {
"enabled": true,
"tiers": {
"t4_host_control": {
"enabled": true,
"allowlist": ["chrome", "firefox", "code"],
"denylist": ["1password", "keepass", "bitwarden"],
"approval_mode": "sensitive",
"sensitive_patterns": ["terminal", "powershell", "bank"],
"user_drift_threshold_px": 50,
"monitor_capture": "primary",
"live_stream_fps": 5,
"live_stream_quality": 60
}
},
"tier_priority": ["T1", "T2", "T3", "T4"]
}
}
3. Per-goal opt-in
Every goal that should reach T4 must set allowHostControl: true in its metadata, OR the goal must explicitly request a T4-only tool (computer_use_host_action) which the orchestrator translates into the same flag.
4. Set FAILSAFE expectations
Train yourself: if T4 is doing something wrong, drag the mouse to a corner. That's your emergency stop. No need to click anything.
What's still missing (deferred)
These were intentionally not in scope for this hardening pass:
- Anthropic-native tool surface routing — when the active vision model is
claude-3-5-sonnetor newer, route through Anthropic'scomputer_20241022tool format instead of generic JSON. Measurable accuracy delta (per Anthropic's published benchmarks) but touches the LLM adapter layer. - T4 skill memory — vision-anchored skills so "open LibreOffice → click cell B5 → type formula" isn't rediscovered every run. Depends on extending
SkillLearnerto handle pixel-coordinate-anchored steps with vision verification. - DPI scaling normalization — Windows scale factor >100% can offset clicks. Needs cross-platform DPI awareness in the screenshot pipeline.
If you hit any of these in practice, file an issue and they'll move up the queue.
Troubleshooting
"Python not found" — the Node server can't find a Python 3 interpreter. Make sure python3, python, or py (Windows) is on PATH in the environment the AI Partner server runs in.
"T4 requires pyautogui and pillow" — run pip install pyautogui pillow screeninfo (plus pywin32 psutil on Windows).
FAILSAFE fires immediately on the first action — your cursor is already in a corner. Move it to the middle of the screen and retry.
Drift guard pauses constantly — your mouse may be drifting due to a touchpad or wireless mouse. Raise user_drift_threshold_px to 100 or higher, or set to 0 to disable.
Live broadcast not appearing in Inspector — confirm live_stream_fps > 0 in config, the T4 attempt is actually running (check goal:action events for tool name computer_use_host_action), and the Inspector panel is open. The t4:stream_state socket event tells you whether the stream is broadcasting.
Active-window allowlist blocks everything — your allowlist entries don't match the actual window-class names. Check the audit log entries with allowlistResult: "blocked" — the error message includes the active window's class string. Use a partial substring of that string as your allowlist entry.