◢ The stack
◢ The build · 4 steps · 22 min
Follow these in order. Don't skip.
01
Step 01 / 04
Get an Anthropic API key with Computer Use access
The Pro plan includes Claude Code, but Computer Use API runs on pay-as-you-go API credits.
- ▸console.anthropic.com → Sign up (or log in)
- ▸Settings → Billing → add a card. Computer Use is available on Sonnet 4.5+ and Opus 4+.
- ▸Settings → API Keys → Create Key. Name it computer-use-dev.
- ▸Save it as ANTHROPIC_API_KEY in your .env.
◆ Watch out
Computer Use can click anything visible. Treat the API key like nuclear codes — never commit it, never run agents on your real desktop without sandboxing.
02
Step 02 / 04
Run the official sandbox in Docker (the safe way)
Anthropic ships a Docker image with a virtual display, browser, and the agent loop. You watch through a browser. Your real machine never gets touched.
Terminal
1# Pull and run the official quickstart image2export ANTHROPIC_API_KEY=sk-ant-...3 4docker run \5 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \6 -v $HOME/.anthropic:/home/computeruse/.anthropic \7 -p 5900:5900 \8 -p 8501:8501 \9 -p 6080:6080 \10 -p 8080:8080 \11 -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest- ▸Open http://localhost:8080 — that's the Streamlit chat UI
- ▸Open http://localhost:6080/vnc.html — that's the live screen the agent sees
- ▸Tell it something like: "Open Firefox, search for the Anthropic blog, and summarize the latest post."
- ▸Watch the screen update in real time.
03
Step 03 / 04
Build your own agent loop in Python
Terminal
1pip install anthropic pillowagent/computer_use.py
1import os, base642from anthropic import Anthropic3from PIL import ImageGrab4 5client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])6 7def screenshot_b64() -> str:8 img = ImageGrab.grab()9 img.save("/tmp/screen.png")10 return base64.b64encode(open("/tmp/screen.png", "rb").read()).decode()11 12def run(task: str):13 messages = [{14 "role": "user",15 "content": [16 {"type": "text", "text": task},17 {"type": "image", "source": {18 "type": "base64", "media_type": "image/png", "data": screenshot_b64(),19 }},20 ],21 }]22 23 while True:24 resp = client.beta.messages.create(25 model="claude-sonnet-4-6-20250101",26 max_tokens=4096,27 tools=[{"type": "computer_20250124", "name": "computer",28 "display_width_px": 1920, "display_height_px": 1080, "display_number": 1}],29 messages=messages,30 betas=["computer-use-2025-01-24"],31 )32 33 # Handle each tool_use block: click, type, screenshot, key, etc.34 # Pyautogui or applescript executes the action.35 # Send a fresh screenshot back as tool_result.36 # Loop until stop_reason == "end_turn".37 if resp.stop_reason == "end_turn":38 return resp39 # ... action dispatch + tool_result append ...40 41if __name__ == "__main__":42 run("Open my browser and find the cheapest flight from NYC to SF next Friday.")◆ Heads up
Use the Docker quickstart code as your reference implementation — github.com/anthropics/anthropic-quickstarts. The action dispatch is 80 lines and handles every tool the model can call.
04
Step 04 / 04
When to use Computer Use vs an API call
- ▸USE Computer Use when: the target has no API (legacy ERP, internal tools, login-walled SaaS), or you need a screenshot of what happened.
- ▸DON'T use Computer Use when: an API exists. APIs are 10× faster, 50× cheaper, and never break on a UI redesign.
- ▸Hybrid pattern: API for the boring 90%, Computer Use only for the 10% the API can't reach.
◆ Ship-it checklist
5 CHECKS
- Anthropic API key with billing enabled
- Docker quickstart running locally — you saw the agent click around in the noVNC viewer
- You ran one custom task end-to-end (e.g., "open browser, search X, summarize")
- You understand the screenshot → tool_use → action → screenshot loop
- You know which 3 problems on your stack should be Computer Use vs API


