Skip to content
Jun '267 min read

I built a sandcastle, then I designed the wave

Building the thing is only half the work. The other half is engineering the forces that try to knock it down, because you can't trust what you haven't tried to break.

ConradOn deadlines and the design of working instruments

A sandcastle on an empty beach looks finished. The maker stands back, sees turrets and a clean moat, and feels done. The problem is that nobody has walked past it yet, nobody has leaned on the wall to see if it holds, and no water has reached it. The castle isn’t finished. It’s untested. Those are different things, and the difference is the whole job.

I built a deadline tool alone, with no users. That’s the oldest pre-launch problem there is. You are too close to your own work to see where it’s thin, and you have no one to hand it to. Every wall looks load-bearing because you’re the one who packed the sand. So I did the only honest thing I could think of. I built the wave on purpose, and I sent it at the castle to see what washed away.

What the wave was made of

The wave wasn’t real users. I didn’t have any. It was a set of simulated ones.

I spun up independent evaluators, gave each a specific working life and a use pattern, and told each one to read the actual built product. Not the marketing copy, not the pitch, the real thing, the way it actually behaved. One played a solo professional using it lightly. One ran a small operation and needed oversight. One did formal filings where a missed date is a real problem. One arrived cold and tried to understand the empty screen. One came in as a skeptic looking for the way it would fail and lose their trust.

Then I cross-checked the design choices against published research instead of my own taste, because taste is exactly the thing you can’t trust about your own castle.

The point of a wave isn’t to admire the castle. It’s to find the place the castle isn’t really standing, only leaning. A simulated user, given the real artifact and an honest brief, doesn’t return hypothetical complaints. It returns the wall that gives.

The tower I liked, and the wave took down

There was a feature I was quietly proud of. A browser pop-up reminder. When a deadline was coming, the product would fire an on-screen notification. It felt immediate and modern, the kind of thing a real tool does. I liked it the way you like the tallest turret, the one you packed by hand and got just right.

The wave took it down.

The skeptic evaluator tested it under normal conditions and estimated how often it actually fired. These are simulated estimates, not measured production numbers, so I hold them loosely. But the shape was hard to argue with. For the rare person who keeps the dashboard open all day, the pop-up landed almost every time. For someone who checks in occasionally, the simulation put it closer to half. On a phone, near zero. For a deadline that came due at night or over a weekend, when the tab was long closed, near zero again.

Read that back. A reminder that fires reliably when you’re already looking at the thing, and goes quiet exactly when you’ve walked away, is not a reminder. It’s a chime in an empty room. It does its best work in the one situation where you didn’t need it, and it fails in the situations the whole feature exists to cover.

Worse, it looked like a safety net. A user reads “pop-up reminder,” closes the laptop, and trusts it. The trust is the dangerous part. A channel that looks like a net and isn’t one is worse than no channel, because it spends trust it can’t back.

Once a plain email reminder did the same job and worked with the tab closed, the pop-up had no reason left to exist. The harder path was real push notifications that fire with the app shut. That would have meant a second delivery system to duplicate what email already did, with a hard ceiling on the one device where people most wanted it. So I deleted the turret I was proud of.

That’s the part worth sitting with. Cutting something you built, after the evidence says it fails when it matters, isn’t a defeat. It’s a design decision. The maker who can’t take down his own tower will defend it straight into the water.

The crack you can’t see from where you’re standing

The pop-up was a tower I could at least see. The more frightening catch was a crack in the foundation, the kind you only find because something is rigorously pretending to be a real person who lives far from you.

The product stored every account’s timezone as UTC, and no sign-in path ever wrote the real one. I had never noticed, because the failure is invisible from where the maker stands. If you build and test in your own head, “today” always means your today. The math was fine. The scanner was correct and tested. It was simply being fed the wrong day for everyone who wasn’t on UTC.

The skeptic evaluator, reasoning as someone in a different part of the world, traced it. For anyone not on UTC, “today,” “due today,” and “overdue” all computed against the wrong day boundary. From late afternoon onward, a deadline due today could read as overdue a day early. A reminder could fire on the wrong local day.

The entire promise of a tool like this is “trust these dates.” A date wrong by one day is the precise failure that voids that promise. And it’s the least glamorous category of bug there is. No screenshot, no demo, no feature. Just a default that was wrong in a way the maker is structurally unable to see, because the maker is always standing in his own timezone. The fix was small. Capture the browser’s zone at first sign-in, let the user correct it. The lesson is the essay: the most dangerous flaw is the boring one, and you only catch it if the wave is genuinely standing somewhere you’re not.

Why a designed wave beats waiting for the tide

You could argue the real beach would have found both of these eventually. Real users would have closed the tab and missed the pop-up. Someone three timezones over would have seen the wrong date. True. The tide always comes in.

But the tide arrives as a one-star review and a refund, in public, with no note attached explaining what broke. The designed wave arrives in private, with a name on each finding, before anyone you’re trying to earn has been let down. Buehler and his colleagues, in their work on the planning fallacy, found that people underestimate their own tasks even when they know they’ve been wrong before. The blind spot survives the knowledge of the blind spot. You don’t get accurate about your own coverage by trying harder. You get there by building something outside yourself that doesn’t share your blind spot, and letting it hit the work.

That’s the whole move. You make the wave deliberately, while the cost of finding out is a quiet afternoon and not a public failure.

The one idea to take

The takeaway is simple enough to say to a colleague at the next desk. Building a thing is half the work. The other half is deliberately engineering the forces that try to knock it down, because you can’t trust what you haven’t tried to break.

You don’t need simulated users to do this. The wave can be a colleague you trust to be blunt, a checklist of the ways things fail, a deliberate hour spent attacking your own decision instead of defending it. The form doesn’t matter. What matters is that the test comes from outside your own head, carries no loyalty to the turret you’re proud of, and gets to hit the castle before the ocean does.

A castle nobody has tested isn’t finished. It’s just dry.


Common questions

How do you test a product when you have no users yet? Build the test from outside your own perspective. The approach here was to simulate users: independent evaluators, each given a specific working life and use pattern, each told to read the real built product rather than the marketing, and report honestly where it would fail them. Then cross-check the design against published research rather than personal taste. A colleague who will be blunt, or a structured list of failure modes, does the same job at smaller scale.

Are the usage numbers in this piece real? No, and that matters. The pop-up hit-rates (close to certain for an always-open desktop, around half for an occasional checker, near zero on mobile and overnight) are simulated estimates from agent-based testing, not measured production data. They’re directionally useful for a decision. They are not a measurement, and they’re framed that way on purpose.

Why delete a feature you’d already built instead of improving it? Because the evidence said it failed in exactly the situations it existed to cover, and another channel already did the job better with the tab closed. Improving it would have meant a second delivery system duplicating what email already did, with a hard ceiling on the device people wanted it on most. Subtraction was the stronger design choice. Keeping a feature you’re attached to, when the data says cut it, is how the work ends up in the water.

What’s the most dangerous kind of bug to miss before launch? The boring, invisible one. A wrong-by-a-day date is far more damaging to a “trust the dates” tool than a missing feature, and it’s the kind a maker can’t see, because it hides inside an assumption the maker shares. You only catch it when the test is genuinely standing somewhere you are not.