Where does your risk live?

When a risky operation is scattered across thirty files, teams, or people, it's invisible. Move it to one place and you can audit it, test it, fix it once.

7 min read

You don't need to understand each other

How a shared interface between parties who don't understand each other's work solves problems across software, teams, and entire industries. From a pipeline config to a shipping container.

8 min read

Sprint planning, map colouring, and why they're the same problem

What map colouring, sprint planning, and CPU register allocation have in common -- and why the maths behind all three has been solved since 1852.

8 min read

No boarding without a ticket

Ticketing systems aren't bureaucracy. They're the organisational equivalent of async interfaces. The real problem is what happens when the queue backs up.

7 min read

Making heavy operations feel light

Why we named the company after a hippo, the philosophy of invisible infrastructure, and what Potamus does.

4 min read

How a single shared metric changes the conversation

Why a single north-star metric turns 'are our engineers doing useful work?' from an opinion fight into a spreadsheet. Lessons from Bolt's GMV.

6 min read

What Amdahl's law taught us about engineering productivity

Amdahl's law applied to engineering productivity. Why brilliant engineers stuck in operational mud output like average ones, and why investing in operations matters more than hiring.

7 min read

Why we prioritise interfaces over implementations

Why a clean interface hiding a terrible implementation beats a clean implementation behind a terrible interface. Applied to pipelines, APIs, and deployment tools.

6 min read

How we gave data scientists ownership of their own deployments

The anti-pattern of data scientists filing tickets to deploy models, what declarative self-service pipelines look like, and why DS owning their lifecycle changes everything.

8 min read

How we cut ML build times from 40 minutes to 5

How I cut ML build times from 40 minutes to 5. What causes slow builds in ML, and practical techniques to fix them.

9 min read

What happens when data engineering becomes the platform, not the gateway

Why the traditional model of DE as a serial gatekeeper between data science and infrastructure creates bottlenecks, and how a shared platform interface eliminates the pattern entirely.

7 min read

What we learned when our ML team outgrew its infrastructure

The classic scaling pain: going from fewer than 10 ML models to 200+ at Bolt. Signs your team has hit the wall, and what self-service looks like when it works.

8 min read