Claude Computer Use: an agent that drives your Mac.

◢ The stack

Anthropic API

Claude API for the smartest agent reasoning

$5 free credit · pay-as-you-go

Claude Code

Agentic coding terminal — edits files, runs commands, ships PRs

Pro plan · $20/mo

GitHub

Source control + Vercel auto-deploy on push

Free

◢ The build · 4 steps · 22 min

Follow these in order. Don't skip.

Step 01 / 04

Get an Anthropic API key with Computer Use access

The Pro plan includes Claude Code, but Computer Use API runs on pay-as-you-go API credits.

▸console.anthropic.com → Sign up (or log in)
▸Settings → Billing → add a card. Computer Use is available on Sonnet 4.5+ and Opus 4+.
▸Settings → API Keys → Create Key. Name it computer-use-dev.
▸Save it as ANTHROPIC_API_KEY in your .env.

◆ Watch out

Computer Use can click anything visible. Treat the API key like nuclear codes — never commit it, never run agents on your real desktop without sandboxing.

Step 02 / 04

Run the official sandbox in Docker (the safe way)

Anthropic ships a Docker image with a virtual display, browser, and the agent loop. You watch through a browser. Your real machine never gets touched.

Terminal

1# Pull and run the official quickstart image
2export ANTHROPIC_API_KEY=sk-ant-...
3 
4docker run \
5  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
6  -v $HOME/.anthropic:/home/computeruse/.anthropic \
7  -p 5900:5900 \
8  -p 8501:8501 \
9  -p 6080:6080 \
10  -p 8080:8080 \
11  -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

▸Open http://localhost:8080 — that's the Streamlit chat UI
▸Open http://localhost:6080/vnc.html — that's the live screen the agent sees
▸Tell it something like: "Open Firefox, search for the Anthropic blog, and summarize the latest post."
▸Watch the screen update in real time.

Step 03 / 04

Build your own agent loop in Python

Terminal

1pip install anthropic pillow

agent/computer_use.py

1import os, base64
2from anthropic import Anthropic
3from PIL import ImageGrab
4 
5client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
6 
7def screenshot_b64() -> str:
8    img = ImageGrab.grab()
9    img.save("/tmp/screen.png")
10    return base64.b64encode(open("/tmp/screen.png", "rb").read()).decode()
11 
12def run(task: str):
13    messages = [{
14        "role": "user",
15        "content": [
16            {"type": "text", "text": task},
17            {"type": "image", "source": {
18                "type": "base64", "media_type": "image/png", "data": screenshot_b64(),
19            }},
20        ],
21    }]
22 
23    while True:
24        resp = client.beta.messages.create(
25            model="claude-sonnet-4-6-20250101",
26            max_tokens=4096,
27            tools=[{"type": "computer_20250124", "name": "computer",
28                    "display_width_px": 1920, "display_height_px": 1080, "display_number": 1}],
29            messages=messages,
30            betas=["computer-use-2025-01-24"],
31        )
32 
33        # Handle each tool_use block: click, type, screenshot, key, etc.
34        # Pyautogui or applescript executes the action.
35        # Send a fresh screenshot back as tool_result.
36        # Loop until stop_reason == "end_turn".
37        if resp.stop_reason == "end_turn":
38            return resp
39        # ... action dispatch + tool_result append ...
40 
41if __name__ == "__main__":
42    run("Open my browser and find the cheapest flight from NYC to SF next Friday.")

◆ Heads up

Use the Docker quickstart code as your reference implementation — github.com/anthropics/anthropic-quickstarts. The action dispatch is 80 lines and handles every tool the model can call.

Step 04 / 04

When to use Computer Use vs an API call

▸USE Computer Use when: the target has no API (legacy ERP, internal tools, login-walled SaaS), or you need a screenshot of what happened.
▸DON'T use Computer Use when: an API exists. APIs are 10× faster, 50× cheaper, and never break on a UI redesign.
▸Hybrid pattern: API for the boring 90%, Computer Use only for the 10% the API can't reach.

◆ Ship-it checklist

5 CHECKS

Anthropic API key with billing enabled
Docker quickstart running locally — you saw the agent click around in the noVNC viewer
You ran one custom task end-to-end (e.g., "open browser, search X, summarize")
You understand the screenshot → tool_use → action → screenshot loop
You know which 3 problems on your stack should be Computer Use vs API

← All guides Show your build in the community →