I stumbled across the “Swiss Cheese Model” of safety yesterday. It’s one of those concepts that, once you see it, you start spotting it everywhere.
Originally used in risk analysis for things like aviation and healthcare, the idea is simple: every system has layers of defence, but those layers are like slices of Swiss cheese.
They all have holes. If you rely on just one slice, a defect eventually finds its way through. But if you stack enough slices together, the holes don’t align. The defect gets stopped.
It occurred to me that this is exactly how we should be thinking about agentic coding.
The stack is what catches defects
When we’re working with AI agents, there’s a temptation to try and build the “perfect” prompt. We think if we can just refine the instructions enough, the agent will never produce a bug.
But that’s a trap. A prompt is just one slice of cheese, and it’s a pretty holey one at that.
Instead of chasing perfection at the prompt level, I’ve been looking at how we can stack different, imperfect layers to create a system that’s actually robust. Here is how that stack is starting to look for me:
- Agent prompts / instructions: Setting the guardrails and intent. It’s the first line of defence, but it’s prone to hallucinations.
- Linting: Catching structural and syntax mistakes the moment the agent “writes” the code.
- Agent hooks: Small, automated checks that run after a tool is used. If the agent tries to save a file that doesn’t compile, the hook catches it before the agent moves on.
- Pre-commit hooks: Blocking obvious issues on the local machine. This keeps the “noisy” mistakes out of the repository.
- CI pipeline: Deterministic, codebase wide checks that ensure the new code plays nice with everything else.
- Automated review: Nightly audits or LLM-based “sanity checks” that look for edge cases that slipped through the cracks.
- Human review: The final layer. This is where context, nuance, and judgment come in, things a tool just can’t replicate yet.

Embracing imperfection
The beauty of this approach is that it takes the pressure off any single part of the process.
I don’t need my agent hooks to catch everything. I don’t need my prompts to be 100% foolproof. I definitely don’t expect my human review to catch every single trailing comma on a Friday afternoon.
Any one of these layers has gaps. Humans get tired. Linting doesn’t understand business logic. Prompts get ignored. But when they’re organised in a stack, the defects have no straight path through. The goal isn’t to create one perfect, impenetrable wall, it’s to make sure the holes don’t align. By feeding problems back to the agents early and often, the whole system becomes a lot more resilient.
It’s a bit of a shift in mindset, moving from “How do I make this agent perfect?” to “How do I build a better stack of cheese?”. It feels a lot more sustainable, too.
Of course, I’m still figuring out exactly where the holes are in my own workflow. I’m constantly tweaking things, but it’s a lot more reassuring than just crossing my fingers and hitting “run.”