Why Your KPIs Are Making You Dumber — And What BPE Does Instead
The problem with output-focused management is not that your people lack effort. It is not that your strategy is wrong. It is that the measurement architecture you built to improve performance is actively degrading the signal you need to run the business. Every time you set a KPI, you start a clock. The moment that metric becomes a target, it begins losing its value as a measure. This is not a theoretical concern. It is a governing law of performance systems, and it is operating in your organization right now whether you have named it or not.
Business Performance Engineering — BPE — is the discipline of working upstream of that problem. It does not start with what you are measuring. It starts with what conditions you are creating, because conditions are what actually produce outputs. The rest of this piece explains the mechanism, names the failure modes in conventional measurement, and describes what it looks like in practice when an organization stops chasing outputs and starts engineering conditions.
Goodhart's Law Is Not a Quirk — It Is a Governing Law of Your Performance System
Charles Goodhart was a British economist. In 1975, observing monetary policy, he articulated something that has since proven true in every measurement context ever studied: when a measure becomes a target, it ceases to be a good measure. That is Goodhart's Law. It is not a warning about bad actors. It does not require dishonesty. It operates on perfectly well-intentioned people doing rational things.
Here is the mechanism. You observe that call volume correlates with revenue. You make call volume a target. People optimize for call volume. Call quality drops. Revenue relationship breaks down. But the metric keeps rising, so your dashboard looks healthy while your actual outcome degrades. You have created a system that reports improvement while producing decay.
The same dynamic appears at every level. A hospital measures readmission rates, so discharge timing gets gamed. A school measures test scores, so instruction narrows to tested material. A sales team measures closed deals, so deal quality deteriorates and churn spikes six months later. In every case, no individual is acting irrationally. The measurement architecture made narrow optimization the rational choice.
MIT Sloan's Spring 2026 issue addressed this directly, with Balázs Kovács arguing that balanced scorecards and traditional KPI frameworks are structurally vulnerable to exactly this failure. The article proposes adapting techniques from AI model training — where Goodharting is a recognized design problem with explicit countermeasures — to human performance systems. The countermeasures are not complicated. They include pairing metrics so that gaming one damages another, rotating measures before the gaming behavior fully develops, and building second-order outcome checks that cannot be satisfied by narrow optimization alone.
The core insight is this: Goodhart's Law does not mean measurement is bad. It means static measurement of narrow outputs is structurally self-defeating. You do not escape it by adding more metrics. You escape it by understanding what you actually want to measure and designing the system to keep that signal clean.
Why Balanced Scorecards Fail the Same Way
The balanced scorecard was introduced in the early 1990s as a direct response to the problem of single-metric over-optimization. The idea was sound: measure across four perspectives — financial, customer, internal process, and learning and growth — so that gaming one dimension gets checked by the others. In theory, it addresses Goodhart's Law by expanding the measurement set.
In practice, it does not. And the reason is instructive.
Adding metrics to a scorecard does not change the optimization pressure — it just distributes it across more targets. Each metric on a balanced scorecard becomes its own Goodhart trap. People hit the number on customer satisfaction surveys without changing the customer experience. They report training hours without changing capability. They hit internal process metrics without improving actual throughput. The scorecard adds complexity, not signal integrity.
The second failure is static target-setting. Balanced scorecards typically get built at the start of a planning cycle and stay fixed for twelve months. But the organization is not static. The problems that justified the targets in January may not exist in October. The targets, however, remain. People continue optimizing for them regardless, because the incentive system is tied to hitting them. You get efficient pursuit of obsolete objectives.
The third failure is that balanced scorecards measure outputs across four categories. They do not measure the conditions that produce outputs. A scorecard can tell you that employee engagement dropped five points. It cannot tell you whether that drop was caused by span expansion, role ambiguity, a change in manager quality, or something else entirely. You can read every line of a balanced scorecard and still have no idea what to actually change.
That is the gap BPE is designed to fill.
What BPE Means Structurally
Business Performance Engineering is the practice of identifying, measuring, and modifying the upstream conditions that produce business outcomes. The outputs — revenue, margin, engagement, project success rate — are not managed directly. They are produced by conditions. BPE makes those conditions the primary object of attention.
What are conditions? They are the structural and contextual variables that determine how people perform work. Span of control is a condition. When Gallup's 2026 data shows manager engagement dropped from 27% to 22% in a single year — the largest decline on record — the BPE lens does not ask how to re-engage those managers. It asks what structural changes created an overload condition. Organizational flattening expanded team spans. A manager whose span grew 40% in three years without any reduction in role scope is not struggling with motivation. The system produced an overload condition. That is what needs to be engineered.
Role clarity is a condition. Fewer than 47% of employees strongly agree they know what is expected of them, according to Gallup 2025. That is not an engagement problem. It is a goal-communication and role-definition problem with a direct structural fix. You do not solve it with a culture campaign. You solve it by rewriting role definitions and aligning manager communication cadence to expectation-setting, then measuring whether clarity scores move.
Decision context is a condition. The HBR 4T model published in January 2026 makes this explicit: traditional training campaigns fail because they operate outside the actual decision context. Behavior changes when a nudge appears at the moment of choice — not because someone attended a session three weeks ago. The decision environment is a condition. BPE engineers it.
Resource alignment is a condition. PMI's 2025 project data shows only 50% of projects deliver value exceeding their cost, and the top barrier is the disconnect between planning and execution. That disconnect is a resource and alignment condition, not a strategy quality problem. When project teams do not have the business judgment to connect daily decisions back to strategic rationale, execution drifts. Fixing it means reallocating training toward business acumen — currently only 25% of training spend versus 46% on technical skills — not running more project management certification programs.
These are the variables BPE tracks. Not the outputs. The conditions upstream of the outputs.
The Causal Chain — Why Fixing Conditions Produces Outputs, But Chasing Outputs Does Not Fix Conditions
This is the structural logic at the center of BPE, and it is worth being precise about it.
Conditions produce outputs. That is the direction of causality. Structural overload on a management layer produces reduced team engagement. Ambiguous role definitions produce lower individual productivity. Poor decision-context design produces behavior that does not match training investment. These are causal relationships, not correlations.
When you manage outputs directly, you apply pressure to the end of the causal chain without changing the conditions that produced the outcome. You tell a manager to improve their team's engagement score. The manager runs a team lunch. The score moves one point. The structural overload condition that produced the low score is untouched. Three months later, the score drops again. You have created a cycle of intervention with no cumulative improvement — and you have consumed the manager's remaining capacity doing it.
This is why DDI's 2025 Global Leadership Forecast shows trust in immediate managers falling 37% in three years. That decline did not happen because managers became worse people. It happened because the conditions around the management role degraded — spans expanded, administrative load increased, strategic demands grew — and the system responded by measuring and pressuring the output (trust scores) without touching the conditions. The trust score is a lagging indicator of structural decisions made 18 to 36 months earlier.
When you fix conditions instead, the causal chain runs forward. Reduce span overload, and managers have relational capacity to invest in their teams. Clarify role expectations, and people know what to optimize for. Align resources to actual project requirements, and execution success rates rise. You are not managing the output. You are engineering the conditions that produce it, and the outputs follow.
Output-Management Organizations vs. Condition-Engineering Organizations
The difference between these two types of organizations is visible in how they respond to the same problem.
An output-management organization sees declining project success rates — say, 50% of projects failing to deliver value, exactly the PMI 2025 figure — and responds by adding governance checkpoints, requiring more status reporting, and running project management training. The measurement gets more granular. The reporting burden increases. Success rates do not improve, because the conditions that produced the failure (weak strategic alignment checks, training biased toward technical skills, no mechanism for reconnecting execution to original problem statement) are untouched.
A condition-engineering organization looks at the same data and asks: what structural conditions produced a 50% failure rate? It finds that strategic alignment is not a live variable in execution — it was checked at kickoff and never revisited. It finds that business judgment is underdeveloped relative to technical capability. It adds a single strategic alignment check to every governance meeting — does this project still address the problem it was scoped to solve? It shifts training spend toward business acumen. It measures whether the conditions changed, then whether the success rate follows.
An output-management organization sees 71% of leaders reporting elevated stress and 40% considering leaving — the DDI 2025 numbers — and builds a wellbeing program. A condition-engineering organization audits what structural changes created the stress conditions, restores the management capacity that was stripped during flattening, and then measures whether burnout indicators move.
The practical difference is where attention and resources go. Output-management organizations spend on symptoms. Condition-engineering organizations spend on causes.
What Shifts When You Adopt This Lens
The first thing that shifts is where your diagnostic questions point. Instead of asking why engagement is low, you ask what structural conditions produced this engagement level. Instead of asking why projects are failing, you ask where the planning-to-execution disconnect occurs and what conditions create it. The question changes the answer set.
The second shift is in your measurement design. You stop tracking only output metrics and start tracking the condition variables that causally precede them. You pair primary KPIs with second-order checks that cannot be satisfied by gaming the primary metric. You build in rotation before metrics calcify into narrow optimization targets. You audit your top performance metrics for gaming signatures on a fixed cycle.
The third shift is in where interventions get targeted. Training does not go to people who need to feel differently about their work. It goes to the specific decision moments where judgment gaps produce execution failure. Structural fixes happen before survey campaigns. Resource alignment gets audited before capability gets blamed.
The fourth shift is in your timeline. Conditions are upstream. If you fix them, the output improvements follow with a lag — typically 18 to 36 months for leadership and structural changes to show up in engagement and trust metrics. That means you need leading indicators of condition improvement, not just trailing output measures. You are managing a system with known latency, and your measurement protocol has to reflect that.
Goodhart's Law does not go away when you adopt a BPE approach. It is still running in your performance system. What changes is that you have designed around it rather than pretending it does not apply to your organization. You measure conditions, pair your metrics, rotate before gaming calcifies, and keep the signal clean. The outputs follow because the conditions that produce them have been deliberately engineered rather than accidentally inherited.
That is the discipline. That is what BPE does.
