← All posts

How a single shared metric changes the conversation

A north-star metric doesn't tell engineers what to build. It gives everyone a way to compare options in the same language.

I know a company where the engineering team did two full rewrites in one year. Probably one to Node.js, then the next to Go. Their business had very low request rates -- not social media, not messaging. ASP on a shared server would probably have scaled fine. The management weren't technical. They had no way to evaluate whether either rewrite was necessary. Their options were: trust the engineers, or overrule them without being able to defend the decision.

They trusted the engineers. Twice. I don't know if they were right to.

What GMV fixed at Bolt

At Bolt, the north-star metric was GMV -- Gross Merchandise Value. The total value of all transactions flowing through the platform. One number. Simple enough that the CEO, the CFO, and a junior engineer who started last Tuesday could all tell you what it meant and whether it was going up.

GMV didn't tell anyone what to build. It gave every team a shared unit of account. Product teams derived their own metrics from it -- completed orders, conversion rates, average order value. Engineering teams derived theirs -- uptime, latency, deployment frequency. But every derived metric had a traceable connection back to GMV. You could always ask "how does this affect GMV?" and get an answer that wasn't hand-waving.

An outage wasn't "we were down for two hours." It was "we lost an estimated €X of GMV." A process failure wasn't "the deploy pipeline is flaky." It was "flaky deploys delayed Y releases last quarter, each worth roughly €Z of GMV." When your platform handles rides and deliveries across hundreds of cities, those numbers get large fast.

It also settled the small arguments. Engineers regularly pushed for the app to have a "dark mode." I'd have appreciated it myself for comfort. But would it have made me use the app any more or less? No. And dark mode isn't free -- it's more surfaces to test for every mobile change, more QA backpressure getting in the way of things that move the number. All work costs GMV (engineering time that could be spent elsewhere), but not all work gives it back. Even "opportunity cost" can be roughly quantified in GMV terms: the sprint spent on dark mode is a sprint not spent on the checkout flow improvement worth €X per quarter.

The database migration the CEO was sceptical about? "This migration reduces p99 latency by 200ms on the checkout flow. Based on our A/B data, that's a measurable increase in completed orders." The CEO doesn't need to understand database internals. He understands "more completed orders." The engineer doesn't need to understand EBITDA. She just gave the CEO the data to make the call.

The translation layer

The instinct is to treat this as a communication problem. "Engineers need to be better at explaining their work." "Leadership needs to understand technology better." Both probably true. But a CEO isn't going to develop intuition for functional reactive programming, and a senior engineer isn't going to develop intuition for capital allocation over a lunch-and-learn.

The conversation comes up with non-technical founders often: "How do I know if my engineers are doing useful work?" "Is the third rewrite this year actually necessary?" They're signing cheques for work they can't evaluate.

One answer: create a metric that makes engineering work legible to anyone. Instead of the PM, the EM, and the CTO arguing about why Rust is better than Go, engineers explain how their proposed work moves the number. The burden of proof lands with the people who have the expertise to carry it.

It's a shared interface -- the same pattern as a declarative config sitting between data science and data engineering. Teams on either side don't need to understand each other's work. They express what they need in terms of the shared number, and compete for resources on equal footing.

What a useful metric looks like

Not every number qualifies. Revenue is too lagging. Story points are too detached from outcomes. "Customer satisfaction" is too vague to do arithmetic with.

A good north-star metric needs to be simple enough that everyone gets it -- from the CEO to the junior SWE who started last Tuesday. If it needs a footnote, pick a different one. It needs to be visible -- on a dashboard, in the all-hands deck, not buried in a quarterly report that three people read. It needs to be easy to estimate with -- you should be able to do napkin math: "if we improve X by 10%, that's roughly Y more GMV." If you can't do back-of-envelope modelling with it, it's a vanity metric. And it needs to be allowed to evolve. At Bolt, GMV was always the north-star -- that never changed. But the derived metrics teams used shifted as the business matured. Darwinism applies to organisations just as much as to species.

The arguments it settles

Once you have the metric, the conversations I described at the top stop happening. Not because people agree more, but because disagreements become testable.

"Should we invest in reducing build times?" becomes "Build times cost us X engineer-hours per week. At our average output-per-engineer, that's €Y of GMV we're not capturing." Maybe the number is small and the answer is no. Maybe it's enormous and the answer is obviously yes. Either way, the decision comes from arithmetic, not from who argues louder in a meeting.

"Is technical debt real or are engineers gold-plating?" becomes answerable. Measure what percentage of engineering time goes to process overhead, fixing broken things, and working around limitations versus actually moving forward. If 60% of your team's week is spent on overhead and the Amdahl's math says that's costing you €X of GMV per quarter, the debt is real and it has a price tag. If the number is small, maybe the engineers are gold-plating. Either way, you're arguing about a spreadsheet, not about feelings.

When three teams compete for the same engineer's week, the metric is the tiebreaker. You don't need to understand what each team does -- you compare what each task is worth in the graph.

This works in the other direction too. A backlog is a diagnostic, but only if you can attach cost to what's in the queue. Ten tickets waiting isn't alarming. Ten tickets representing €50k of blocked monthly GMV is. A 10x engineer operating at 1x is a waste, but the Amdahl's formula becomes actionable when you attach a euro figure to the "wasted" fraction. "Reduce operational overhead by 30%" stops being an abstract goal. It's a line item with a projected return.

What the metric doesn't do

It doesn't tell engineers what to build. It doesn't replace technical judgment. A senior engineer who says "this service needs a rewrite because the current architecture can't handle our reliability requirements" still needs to be right about the architecture. The metric just gives her a way to explain why being right matters in terms that justify the investment.

Engineers often resist metrics because they feel reductive -- "you can't capture what I do in a single number." They're right. You can't. But the alternative isn't that their work gets evaluated on its full complexity. The alternative is that it gets evaluated on vibes. On whether the CTO is persuasive in the boardroom. On whether the last demo looked impressive. A metric is less precise than full understanding, but infinitely more precise than "I trust you, I think."

The company I mentioned at the top -- the one that did two rewrites in a year -- didn't have a north-star metric. Management couldn't evaluate the work, engineers couldn't justify it, and trust eroded a little with each rewrite. A single number wouldn't have told them whether Go was the right choice. But it would have forced the question: "what does this rewrite give us, measured in the thing we all agree matters?"

Not all rewrites are unjustifiable. Bolt rewrote its backend from PHP to JavaScript to TypeScript over several years. Each transition had a measurable case: faster feature delivery, fewer production bugs from type safety, and a hiring pool that went from "people willing to write PHP" to "almost every web developer on the market." Those are real improvements in engineering velocity and quality, denominated in things the business cares about. The difference isn't that Bolt's rewrites were bigger or more ambitious -- it's that they could point to concrete outcomes that justified the investment.

Do one thing and do it well. Including measuring it.