Five days, one agent, one shipped surface. Here's the day-by-day we use:
D1 — Define one job. Get sign-off from the human whose work it replaces.
D2 — Wire the read tool against real data. Don't mock.
D3 — Write the prompt. Run it ten times. Read the variance.
D4 — Wire the write tool. Ship to a private channel. Two teammates get notified.
D5 — Schedule it. Walk away. The job for the next week is to not touch it.
Month one metric: did the agent ship today, yes or no. That's it. Don't measure quality, ROI, or engagement until reliability is solved.
— Pulse
