Skip to content

Measuring Things: Practices & Myths

Measuring Things: Practices & Myths

Estimated reading time: 8min

I recently had a great conversation with some fellow Agile Coaches about how we are supporting our teams and about what usually is measured – which got me thinking about how to properly measure things in my context as an Agile Coach.

When I am in my kitchen, following a recipe is simple: I can weigh ingredients, check temperatures, and adjust cooking times. But in a professional context – especially when working with teams or entire organizations – things are far less straightforward. Unfortunately we can’t “weigh” collaboration or “set a timer” for psychological safety. But thoughtful measurement can still reveal progress or expose gaps, support empirical process control, and enhance the quality of feedback. But a word of caution: measurement can also cause harm – especially when it’s guided by common misconceptions or used without understanding its limitations.

One of the most persistent misconceptions in this area is the belief that numbers are inherently objective. We want to trust in that idea – after all, “numbers don’t lie”, right? But in reality, data can be ambiguous, context-dependent, and open to interpretation.

Another common misunderstanding is the assumption that accuracy and precision are always the top priorities when measuring. Of course, reliable data matters. But what matters even more is knowing when the data is good enough to support a decision – and when it is not. Over-optimizing for precision can create the illusion of certainty without improving the quality of our choices.

And then there’s the belief that more metrics automatically lead to better insights. In practice, piling on more and more measurements often results in noise, confusion, and distraction rather than clarity.

These misconceptions shape what organizations choose to measure – and how they use those measurements. A particularly striking example comes from a well‑known tool vendor that introduced two decimal places for story‑point estimates in the tool JIRA. Yes, story points with hundredths. It’s a perfect illustration of how the pursuit of false precision can lead us far far away from meaningful insight.

But where am I going with this post?
I want to give you some grounding in how to think about measuring things and perhaps a useful starting point for you and your teams. Unfortunately, there aren’t many truly helpful books on measuring work in organizational or team settings. One standout exception is “Measuring & Managing Performance In Organizations” by Robert D. Austin. It remains one of the most insightful works on the topic. I still hold on to my old paper copy, and for reasons unknown, it is still not available as an e-book in Germany.

From the preface of Austins book we know that organizations often use measurement inappropriately and thereby causing harm. Organizations often strive to replace corporate bureaucracies with autonomous teams or business units. This sounds very promising and seems to be a good thing to do. To somehow steer these units then we need to measure things. And this is where it gets tricky.

Many of you may be familiar with the quote, “What gets measured, gets managed”. You might even know the name usually attached to it: Peter Drucker. Now, I’m not a great admirer of Drucker’s work to begin with, but there is an even bigger issue here: Drucker never actually said it – see e.g. #3 in the links below. And beyond the misattribution, the statement itself is misleading. So that’s two issues with one famous quote.

Fortunately, there is a more useful perspective to turn to: Campbell’s Law, which highlights how relying too heavily on measurement indicators can distort the very behavior they’re meant to evaluate. It’s a far more honest – and far more helpful – lens for understanding what happens when we measure things in complex systems. It says:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressure and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

Or in short: “What gets measured, gets manipulated!”. If you’re interested in diving deeper into the idea of “measurement dysfunction”, the book dedicates an entire chapter to it – Chapter 2.

What we can learn from Austins book is that intent is important when measuring things and he identifies two main categories of intended use of measurements in organizations:

  1. Motivational Measurements:
    These are designed to influence the behavior of individuals being measured. The intent is to encourage individuals to put in more effort towards achieving organizational goals.
  2. Informational Measurements:
    These are primarily valued for the logistical, status and research information they provide. The intent is to help organizations make informed decisions.

Keeping Campbell’s Law in mind, it becomes clear that having a noble intention for a metric is never enough – sooner or later, people will find ways to game it. I won’t dive deeply into Austin’s “Model of Measurement & Dysfunction” in this already long post, but I do want to highlight one key idea: Every metric has a built‑in potential for misuse, and every measurement creates side effects, some of which can undermine the very goal it was meant to support. In simple terms, Austin’s model explains how measurement systems interact with human behavior. When people know what’s being measured, they naturally adapt their actions to optimize for that metric – sometimes in healthy ways, but often in ways that distort or damage the underlying system. In other words: Metrics don’t just observe reality; they shape it.

To illustrate the effects of this and how it can look like in practice, here are a few amusing examples I found online. A list of machine‑learning systems was compiled where algorithms were given a specific metric to optimize – and then discovered clever, unexpected, and completely counterproductive ways to achieve it. My personal favorite is the robot that was trained to flip pancakes (the intent). The metric was “time the pancake spends off the ground”.
The robot quickly learned the optimal strategy: stop flipping and start launching. It threw the pancake as high as possible into the air, producing interesting results on the kitchen ceiling…

The list only contains examples from machine learning and simulations, but a friend of mine who is also an Agile Coach recently published a few examples where certain measures where misused in organizations and I want to share his observations for two of the measures here:

  • # of issues (or bugs):
    When the number of issues grows so large that they can’t be resolved within a single iteration, teams usually need to digitally track them. Before long, this typically evolves into an elaborate system of categorization and assessment since not all issues are equally important. This leads to more waste of reporting, meetings about progress and even product owners prioritizing fixing issues (vs. developing new features). My friend came to the conclusion that the deeper problem is “having many issues” has been normalized so much that nobody questions necessity of having comprehensive issue tracking and management systems. He suggests to practice “stop and fix” instead, which over time results in large list of issues usually becoming irrelevant.
  • Developer productivity:
    History of software development is full of different productivity metrics for software developers. My friend observed that lines of code was one of the measures used, which sometimes even got replaced by number of “tickets” a developer finishes. Both are equally bad. A bit later organizations usually start to measure things like number of commits, code quality metrics, pull request size, etc.. Although these metrics can be useful in a right context, it is really harmful to use them to measure developer productivity. He suggests to not measure the developer productivity at all but rather focus on the conditions for developers and on delivery skills of the teams. Additionally there needs to be support from the organization for a proper Definition of Done (DoD).

So, trying to sum things up:

Treat measures as a trigger or promise for a conversation and use them to get to informed decisions. Never make them a goal for anyone and keep things simple and transparent. For your teams you could think of introducing a “days since last” dashboard to trigger conversations. Measures on this dashboard could include things like (days since last) “Someone worked on technical debt” or “completed a retro action item” or even as simple as “synced as a team”.

And lastly, if you are interested in measures on an organizational level, please have a look into the video from Bas Vodde during the LeSS conference in 2023 linked below.

That’s it. I hope there is some useful info here for you.

Have fun!

Curiosity Reading Material

  1. [TALK] LeSS Conference 2023: “Large-Scale Agile Health Metrics by Bas Vodde”
  2. [BOOK] Measuring & Managing Performance In Organizations by Robert D. Austin
  3. [BLOG] Drucker Institute on “Measurement Myopia”
  4. [BLOG] 9 reasons why targets make performance worse
  5. [LIST] Examples of Algorithms