Jun '267 min read

I built a sandcastle, then I designed the wave

Building the thing is only half the work. The other half is engineering the forces that try to knock it down, because you can't trust what you haven't tried to break.

ConradOn deadlines and the design of working instruments

A sandcastle on an empty beach looks finished. Turrets, a clean moat, the maker standing back feeling done. But nobody has leaned on the wall yet, and no water has reached it. The castle isn’t finished. It’s untested. Those are different things, and the difference is the whole job.

Before Deadlinewatch reached anyone, it had the same problem every new thing has. A maker is too close to the work to see where it’s thin. You test it the way you expect it to be used, and the gaps hide exactly where your expectations do.

So I built the wave on purpose. Part of it was human. Trusted colleagues took the tool for a walk, and so did my partner. But a small circle can only stand in so many places, so the rest of the wave was simulated users, each given a specific working life, each told to use the real product and report honestly where it failed them. A solo professional using it lightly. Someone running formal filings. Someone arriving cold at an empty screen. A skeptic hunting for the way it would lose their trust. The simulated set moves faster, runs at a scale a circle of colleagues can’t match, and covers roles and scenarios I could never recruit. Not instead of people. On top of them. Then every design choice got checked against published research instead of my own taste, because taste is the thing you can least trust about your own castle.

The feature that didn’t survive

One feature came back marked for removal, and it was one I was proud of. A browser pop-up reminder. When a deadline was coming, the product would fire an on-screen notification. It felt immediate and modern, the kind of thing a real tool does.

The skeptic evaluator tested it under normal conditions and estimated how often it actually fired. These are simulated estimates, not measured production numbers, so I hold them loosely. But the shape was hard to argue with. For the rare person who keeps the dashboard open all day, the pop-up landed almost every time. For someone who checks in occasionally, closer to half. On a phone, near zero. For a deadline that came due at night or over a weekend, when the tab was long closed, near zero again.

Read that back. A reminder that fires reliably when you’re already looking at the thing, and goes quiet exactly when you’ve walked away, is not a reminder. It’s a chime in an empty room. And because it read as a safety net, it spent trust it couldn’t back. A plain email reminder already did the same job with the tab closed. The honest upgrade, real push notifications that fire with the app shut, would have meant a second delivery system duplicating what email already covered. So I deleted the turret I was proud of.

That’s the part worth sitting with. Taking out a feature you built and liked, once the testing shows it wasn’t practical for most of the people it was for, isn’t a defeat. It’s the tool’s principles holding. Every feature earns its place by working, not by being loved by its maker.

What else the wave found

The same testing caught what no amount of belief would have. The deepest catch was a timezone default that fed the date math the wrong day for anyone not living on UTC, invisible from where I stood, because in your own hands “today” always means your today. That boring, dangerous class of bug has its own piece. The point here is simpler. A designed wave finds these in private, with a name on each finding. The real tide finds them in public. Buehler and his colleagues showed that people underestimate their own work even when they remember being wrong before, and the planning fallacy survives knowing about it. Your blind spots don’t yield to trying harder. They yield to something outside your own head that doesn’t share them.

The one idea to take

Building the thing is half the work. The other half is deliberately designing the forces that try to break it, because you can’t trust what you haven’t tried to break. The wave doesn’t have to be simulated users. A colleague you trust to be blunt, a checklist of the ways things fail, an hour spent attacking your own decision instead of defending it. It only has to come from outside your own head, carry no loyalty to the turret you’re proud of, and reach the castle before the ocean does.

A castle nobody has tested isn’t finished. It’s just dry.

Common questions

How do you test a product before it has users? Use everyone you can reach, then extend past them. Here that meant trusted colleagues and a patient partner first, then simulated users. Independent evaluators, each with a specific working life and use pattern, each reading the real built product rather than the marketing, reporting honestly where it fails them. The simulated set moves faster, scales wider, and covers roles a small circle can’t. Then check the design against published research rather than personal taste. The full method has its own piece.

Are the usage numbers in this piece real? No, and that matters. The pop-up hit-rates are simulated estimates from agent-based testing, not measured production data. They’re directionally useful for a decision. They are not a measurement, and they’re framed that way on purpose.

Why remove a feature instead of improving it? Because the testing said it failed in exactly the situations it existed to cover, and another channel already did the job better with the tab closed. Improving it meant building a second delivery system to duplicate the first. Subtraction was the stronger design choice, and the one truer to what the tool is.

← EarlierHow do you test a product nobody uses yet?Next →Why you keep missing deadlines