My "blocked-by-default" approach to working with coding agents

In April this year, a headline made the rounds: an AI agent wiped a company's production database in nine seconds. Good intentions. Good context. Careful prompts. Nine seconds.

A well-meaning agent, with all the context in the world, can ruin your day in the time it takes to read this sentence.

I gave a talk about this recently. It was about two things I didn't expect to ever put in the same sentence: cave diving and working with coding agents. I'm an engineering manager by trade and a diver growing my technical skills by obsession, and the two crafts have ended up in conversation with each other far more than I anticipated.

Here's the through-line: the failure mode shows up before the failure does. Divers have known this for fifty years. We're relearning it now, in software, with agents.

Same mistake, different bill

When a programmer makes a mistake, they have a bad day. You revert the commit, restore a backup, apologize at stand-up.

When a cave diver makes a mistake, they die. Not a metaphor. People die in caves every year.

That asymmetry, bad day versus you die, is the reason technical diving built a far more mature risk culture than ours. And that means there's a lot we can steal.

How divers think about risk

We learn by reading the dead. A core part of dive training is studying accident reports. Real names. Real sites. What lined up that particular morning. It's uncomfortable. You're learning from tragedies. But it's the only honest way to internalize that risk isn't theory. It's a specific person, in a specific sump, on a specific day, with a specific team.

But the dead only tell you what happened. They don't tell you why. For that, divers read a second layer: human factors. Gareth Lock's Under Pressure takes the Swiss-cheese model and applies it to diving. Sidney Dekker's Understanding 'Human Error' explains that error is almost never malice or laziness. It's a badly designed system that invited the error in. Peter Bernstein's Against the Gods gives you the long view of how humanity learned to tame risk at all.

The dead tell you what failed. These books tell you what conditions made it likely.

One report has stuck with me. Twin Cave, Florida, March 2025. Three divers, rebreathers, scooters, a thousand feet of penetration. They pass a restriction, it silts out, and the middle diver gets disoriented. Here's the part that breaks my brain: his rebreather was working perfectly. They stripped it down afterward, and it was dry and operational. But because he couldn't read his oxygen partial pressure through the silt, he bailed out to open circuit. His gas consumption, normally half a cubic foot per minute, spiked to over two. He drained a full cylinder in ten minutes and drowned, with 1,400 psi sitting untouched in his other bottle.

Four lessons from that report that should sound familiar:

Silent degradation. Something was leaking the whole way in, and nobody noticed.
You can't predict your bandwidth under stress.
In a panic, people abandon systems that are working. Just because they can't see them.
Resources only count if you use them. 1,400 psi in a bottle he never switched to.

Plan your dive. Dive your plan.

Before you get in the water, you decide exactly how deep you'll go, how long you'll stay, what gases you'll breathe, where the exits are. Then you execute that plan. You don't improvise. Improvising in a cave is git push --force main at 6 p.m. on a Friday: sometimes it works, and one day it doesn't.

Anyone can call the dive

If your buddy in the parking lot says "I feel off, let's not," the dive is cancelled. No discussion, no vote, no bullying. One voice is enough. That culture, veto without penalty, is what keeps people alive, and it translates one-to-one to teams: if your junior says "this smells wrong," you stop everything.

Assume everything that can go wrong, will

It's defensive by construction. Not pessimism. Engineering. Your regulator can fail. Your buddy can panic. Your computer can lie. You plan for all of it at once. If you only plan for the perfect day, you die on the day that isn't perfect.

But agents are different

And they're different in exactly the direction we need.

You can stop an agent. You can't stop a diver who has decided to enter the water. But if you give an agent the right tools and enforce your policies, it literally cannot execute the action. Human safety fights will. Agent safety just sets capability. And you get to decide it.

Before going further, two myths worth noting:

Agents don't reason. They predict the next likely token from prior context: a very good statistical impression of thought, but not thought. The distinction matters the moment the stakes are real.
More context does not mean a safer agent. A million-token window buys you the same failure mode with more surface area. The model speaks with equal confidence whether it knows or not.

And here's the deepest difference. When a diver reads the Twin Cave report, they aren't reading data. They're reading a bottle that emptied in ten minutes, a partner who never surfaced, their own mortality reflected back. That weight is what makes their hands move differently in the water next time. It's how we turn knowledge into wisdom. When an agent reads the same PDF, it updates a weights vector. Full stop. No body, no consequence, no metabolism. We can teach an agent everything we know, but not what we've felt.

Which is the whole point: we cannot depend on the agent's judgment. We have to depend on structure.

Because nobody shows up to work intending to ruin everyone's day. Not the agent, not you, not me. People make mistakes because they're operating inside a system that lets them. If your environment allows a buggy commit to reach production, eventually one will. That's physics, not morality.

Inverting the Swiss cheese

The classic Swiss-cheese model comes from James Reason in 1990. It's default-allow. Each defensive layer has holes. An accident is the day every hole lines up and the hazard threads straight through. Reason asked: how do we make it unlikely the holes align?

I flip the question. What, exactly, is the set of things I want to permit? Everything else is impossible by default. default-deny.

When you invert the model, the holes stop being failures and become doors: explicit, well-lit, logged. A door is a conscious permission, not a leak. If you're going to deviate from the process, you walk through a door on purpose, and the door leaves a trace.

There are actually two inversions in my setup:

Default-deny for operations: everything blocked unless explicitly permitted.
Default-act for decisions: because I work alone, there's no second human waiting to approve. If the agent stops to ask me about everything, all I'm paying is the round-trip cost. So I keep an Autonomy Charter: green (act, don't ask), yellow (act, then report), red (ask first). Only the things that cost money, touch production, or affect other humans get confirmed. Everything else executes.

Five layers of cheese

In practice my setup has five layers. Each one denies by default and has exactly one explicit, audited door.

Layer 1: remove the opportunity, then block the intent. A set of hooks fires before any command runs. The first thing they do is the highest-leverage move in the whole stack: they won't let the agent edit the main checkout at all. The moment real work starts, it happens in a throwaway per-task worktree. The main tree is never the work area, so the worst class of mistake has nowhere to land. On top of that, the same hooks block specific intents. Try git commit --no-verify and the hook answers in JSON: blocked, here's why, here's the fix. The agent never runs it, and it's told how to recover. Removing the chance beats catching the act, every time.
Layer 2: block the operation. Git itself: core.hooksPath, pre-commit / commit-msg / pre-push. Caller-invariant. It doesn't matter whether it's the agent or me at 3 a.m. If Layer 1 is somehow bypassed, these catch the operation anyway.
Layer 3: block the loose ends. Session-lifecycle hooks. They warn me about uncommitted changes when a session starts, and they refuse to let me end one with unpushed commits or half-finished work.
Layer 4: block the workflow. A CLI is the only legal way to mutate anything, with start gates (the issue exists, it has acceptance criteria) and fourteen finish gates (rebase, push, issue transition, and the rest). It fails clean and idempotent. Fix the one thing it flagged and resume.
Layer 5: block at the merge. CI, on machines the agent never touches. It re-derives the truth from a clean checkout on an external runner. It exists precisely because I assume every earlier layer could have been bypassed. The last word.

The trick that makes the inverted model work where the classic one fails: the validations are single-source. One regex file defines what a valid commit message is, and the Claude hook, the git hook, the CLI, and CI all import it. One template, five layers. If one passes, they all pass. If one fails, it fails identically everywhere. Two layers can't drift apart.

For a bad change to reach main, it would have to pass through all five default-deny layers deliberately, knowing every specific escape hatch, with each bypass requiring an explicit environment variable that lands in the log. Accidental deviation isn't unlikely. It's structurally impossible. Intentional deviation is possible. This is a workshop, not a prison. But it's loud. Visible.

Your settings are not my settings

That second inversion, default-act, is tuned for exactly one person. Me.

It's worth saying out loud, because it's the part nobody should copy blindly. The shape of the system travels anywhere: default-deny on operations, explicit doors, single-source validation. The settings don't. I run the decision dial cranked all the way toward autonomy because I'm the only person the agent can hurt, and I'm the one reading every trace it leaves. So I tell it not to ask.

Put five engineers behind the same agent and the math inverts. A wrong autonomous action now costs someone else's afternoon. There's a second human who could have caught it in review, so not using them is the waste, not the round-trip. And "loud and visible" only protects you if someone is actually watching the log, which at five people is a real assignment, not a given. A team turns the same knobs the other way: red-list wider, more doors that need a second set of eyes, more effort spent making the trail legible to people who weren't in the room.

And that's just headcount. How hard you make the operations layer depends on blast radius. A hobby repo and a system touching customer data are not the same building. How much rope you give the agent depends on how much you trust the operator behind it. I trust myself at 3 a.m. differently than I'd trust a brand-new hire's setup on day one. A regulated industry, an irreversible action, money in the loop: each one is a dial, and every company sets them differently, because each is weighing a different cost.

So don't copy my configuration. Copy the method: decide, on purpose, where each of your doors goes and how loud it is when someone walks through. The framework is portable. The thresholds are yours. Setting them honestly is the actual work.

The one idea to take with you

The context you give your agents isn't instructions. It's suggestions. Most of the time they're followed. But most isn't a business model. You can't build a process, or a career, on non-deterministic guarantees.

So build the infrastructure that makes the right thing the only thing. Not the likely thing. The only one. That's your job. Not the agent's, not the model's. Get it right, and your worst day is still just a bad day. Get it wrong, and you find out how much can happen in nine seconds.