Single metrics obfuscate the nuances and dynamics of processes

Single metrics obfuscate the nuances and dynamics of processes

25 January, 2024 14 min read
production, manufacturing, process, Lean, management, metrics, monitoring

The website oee.com portrays OEE, the “Overall Equipment Effectiveness”, as “the gold standard for measuring manufacturing productivity” [sic]. OEE is a number between 0 and 1 (0% and 100%) that results from multiplying three different metrics: Availability, Performance, and Quality.

  • Availability indicates the degree to which the process is running (when it’s supposed to be running).
  • Performance indicates the speed of the process (when it’s supposed to be running) compared to its theoretical optimal speed.
  • Quality, the simplest metric, is basically the yield of the process in terms of good parts produced compared to total parts produced.

More about this on the “Calculating OEE” of oee.com.

So you measure each one of those and multiply them, and you have a single metric: OEE. How simple!

How to lose information and prevent nuanced insights

Now, each one of the three metrics is useful by itself. However, collapsing them into a single number reduces the dimensionality of the underlying system being described with those metrics and strips away the possibility of taking a nuanced view, even if you arbitrarily call this (very bad) single-objective function something that sounds official and imposing, like “OEE”.

What exactly are you evaluating when you multiply those numbers?

Let’s assume that every {A, P, Q} triplet describes the state of the system being investigated.

Even if each of those metrics were quantized to a single decimal (including 0.0), you would have (10+1)3, i.e. 1331 different possible combinations of A/P/Q values, i.e. states of the system in terms of its operating condition.

Let’s be charitable and assume that none of those metrics gets worse than 0.5; this still leaves you with (5+1)3, i.e. 216 different combinations of those values.

Note that we are making the numbers 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 merely to make an argument; I am not claiming that e.g. a 50% scrap rate is something that just casually occurs and gets left untreated.

Still, 216 is still a lot of different states of the system for us to “collapse” to a single value through the OEE multiplication, especially when you take into account the way the values of this multiplication are distributed. Obviously, the distribution of the resulting values of A × P × Q skews towards smaller numbers, as we are multiplying fractions, which can make interpretability difficult; an OEE close to 1.0 is impressive, and must for sure come from very high A/P/Q values. However, just a single factor being 0.5 makes the OEE dip to 0.5, even if the other two are equal to 1.0.

In any case, the OEE lacks discrimination power, as it collapses a three-dimensional space into a single dimension. How many different OEE result out of the 216 combinations of the A/P/Q values? Let’s multiply in Python:

>>> from itertools import product
>>> r = [p/10 for p in range(5, 10+1)]
>>> result = [round(a * p * q, 3) for a, p, q in product(r, repeat=3)]
>>> unique = sorted(list(set(result)))
>>> len(unique)
55

(I rounded the multiplication result to three decimal digits; is there anyone who will bat an eyelid to the change of the OEE from 0.432 to 0.420, from 0.252 to 0.270, or from 0.720 to 0.729? These BTW are all neighboring values of the result list calculated above.)

In this toy example with those numbers (actually, levels) chosen, there are 216 different system states modeled by {A, P, Q} triplets, but the OEE (with three decimal digits) only takes 55 different values, which means that the majority of system states share the same OEE value as other states of the system. I repeat: 161 states of the system, i.e. a whopping 74.5% of the total, simply “hide” behind the same OEE value as other states.

This is a serious loss of information through aliasing for a method that supposedly is meant to provide insights on how a process is doing; multiple (actually, most) states of the system map to the same OEE value, making it impossible to look at a calculated OEE value and pinpoint what in that system is responsible for that value (or change from a previous one) without looking into the OEE’s three factors – which you anyway had to have available in order to calculate OEE in the first place.

How to let yourself be misled by a single number

And here’s where it gets worse!

Let’s say that yesterday’s OEE was 0.360 and today’s is 0.378, and you do bat an eyelid to this apparently small change, and you are happy – OEE has improved by 5% since yesterday!

But wait… where are those values potentially coming from?

Here is one such possibility:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑

Oops – in reality, if you were to look at the three factors separately you would realize that it doesn’t look that good, despite the marginal increase in the OEE value… Yes, the single metric of the OEE improved, but only because one of the three factors (Q) improved more since yesterday than the other two factors (A and P) deteriorated. The multiplication masks what really happened.

But wait – it gets worse.

Let’s imagine that you are one of those who love to “manage by spreadsheet” or by looking at some dashboard on your laptop or tablet, enamored with KPIs and metrics, far away from the shop floor. You have been sold the concept of OEE and this is the first number you look at every morning, perhaps on nice screens suspended from the factory ceiling that you can see from your office’s window overlooking the shop floor.

Imagine that you just looked at the OEE and you are happy that it went up, without looking into the three factors. After all, OEE was part of your training on Six Sigma, or Lean, or whatever is being peddled these days as the way to run operations, and of course you’re going to put into practice every tool at your disposal.

And what’s more alluring than a single “metric”? Very little.

So,the next day, the OEE reads an exciting “38.4%” – awesome! Without looking into the A/P/Q values, you look at the scoreboard, smile and go about your day.

In reality:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑
3 60% • 80% ↑ 80% ↓ 38.4% ↑

The next day, the OEE reads an even more exciting “39.2%”! The number keeps going up! Things are going well! You smile even wider and go about your day with a spring in your step.

In reality:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑
3 60% • 80% ↑ 80% ↓ 38.4% ↑
4 80% ↑ 70% ↓ 70% ↓ 39.2% ↑

The next day, the OEE reads an impressive “43.2%”! “What? How? This is phenomenal!” you think to yourself. The good news keep coming, and the continued rise of the OEE number allays any critical tendencies you might have to look closer into what is driving the OEE higher. So, you are positively radiant about how this process is running – after all, the OEE’s rise proves it!

In reality:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑
3 60% • 80% ↑ 80% ↓ 38.4% ↑
4 80% ↑↑ 70% ↓ 70% ↓ 39.2% ↑
5 90% ↑ 80% ↑ 60% ↓ 43.2% ↑

The next day, the OEE reads “44.8%”. “OMG!”, you tell yourself, “under my leadership this production process has made a complete turnaround!”, beaming with pride. Compared to day 1, this is further good news, at least on the surface. After all, if you were to look into the A/P/Q values instead of to the OEE in isolation you would recognize that Quality was all over the place, but improved overall, Performance was more or less constant, with minor deviations, and Availability was variable, and ultimately ended lower than it started. Despite this process clearly being all over the place in summary, the OEE improved by almost 10%!

In reality:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑
3 60% • 80% ↑ 80% ↓ 38.4% ↑
4 80% ↑↑ 70% ↓ 70% ↓ 39.2% ↑
5 90% ↑ 80% ↑ 60% ↓ 43.2% ↑
6 70% ↓↓ 80% • 80% ↑↑ 44.8% ↑

And now for the opposite direction…

The next day, the OEE reads “44.1%”.

“Eh, what’s a slight drop after a week of constant improvement? 44.1, 44.8… it’s a wash!” you tell yourself as you shrug it off and move on to making your slide deck for this week’s status update. “Constant rise of OEE indicates continuous improvement of process X” is the title of the most important slide in your deck.

In reality:

Day Availability
A
Performance
P
Quality
Q
OEE
A × P × Q
1 90% 80% 50% 36.0%
2 60% ↓↓ 70% ↓ 90% ↑↑ 37.8% ↑
3 60% • 80% ↑ 80% ↓ 38.4% ↑
4 80% ↑↑ 70% ↓ 70% ↓ 39.2% ↑
5 90% ↑ 80% ↑ 60% ↓ 43.2% ↑
6 70% ↓↓ 80% • 80% ↑↑ 44.8% ↑
7 70% • 70% ↓ 90% ↑ 44.1% ~

Why multiplication, anyway?

Not to belabor the point, but I think it’s obvious that the multiplicative definition of the OEE creates a non-linearity that can masquerade the true nature of situations – of course, only as long as you take the OEE at face value and don’t investigate where it’s coming from.

It does not matter if we play with fictionally-rounded A/P/Q values from 0.5 to 1.0 with a single decimal, or with actual values; the math is the same.

“OK, if the multiplication is the source of this masking non-linearity, then why don’t we define OEE as the average of the three values?”

Sure, why not? For example, let’s define “OEE prime” like so:

OEE’ = (A + P + Q)/3

Well… you aren’t wrong. This would definitely prevent the problem of a strong rise in one of the factors masking the combined slight decline of the other two factors in an exaggerated manner, but this is still a single-objective function in which the three components (A, P, and Q) are equally weighted.

This implies that you would be happy with a constant OEE’ value that results from Q dropping by 20% while both A and Q increase by 10%, respectively).

“OK, then let’s make it a weighted average!”

Should you be happy with 1/3, 1/3, 1/3 weights? Should you go ahead and create some weighted-average function that skews towards A, P, or Q?

Of course not; this would be a totally arbitrary “setting”, and your new “objective function” still suffers from the same problems of aliasing, i.e. of multiple states of the system being mapped to fewer possible values of this fictional OEE’ “metric”.

Plus, here’s my cynical take on why the OEE is multiplicative:

  1. A simple average is too simple to be taken seriously.
  2. There’s a certain “marketing benefit” to the multiplicative definition of the OEE. It enables the pretense that it is some kind of a physics-inspired attribute of the system (the process), like the efficiency of a vehicle’s powertrain, or like a total loss factor between the perfection (OEE equals 1) and whatever it is that’s happening.

System dynamics make the OEE more debatable

“Ah yes, but when we multiply three percentages, this is indeed like the total efficiency of a chain of subsystems in series!”

No – because the three factors the go into the OEE calculation are not independent; they are not subsystems of the process that are in series. They are surface-level measurements of what goes one within the process, and what goes on there is impacted by many factors that are interconnected.

Thus each of those A/P/Q metrics are not only not independent of each other, but they are even dependent upon each other with a time lag! So, A, P and Q are linked through a whole network of factors that combine into the dynamic system that is the process. Think “Causal Loop Diagram” and “Stocks and Flows” (System Dynamics).

Think about it; could it be that P is high this time around because Q was allowed to slip due to underlying human factors that also caused A to drop? Then, in the next round of measurement, Q improved and P dropped because of a higher A? And so on and so forth.

A production process is (usually, when worthy of monitoring) a complex system with both tangible and intangible inputs. Therefore, not only does this naive calculation reduce the dimensionality of the system being allegedly described by a single metric, but it also entirely obfuscates the dynamics of the (very complex) system that gives rise to different values of A, P, and Q in sequence over multiple measurement periods.

In other words, there are actual underlying factors and root causes that make A, P, and Q change not entirely independently from each other, and even with some delay.

So, I ask you again: what exactly is the OEE number capturing beyond some aliased surface-level mess of the interactions between variables that are not entirely visible, and can mostly be impacted by working on the system itself, instead by looking at surface-level metrics and pretending to know what’s going on?

I don’t know.

But it sure looks alluring to have a single “metric” (actually not a metric, because it measures nothing) to report!

It’s a misleading oversimplification

Allegedly, the OEE is meant to simplify communication and decision-making about a process at a higher level, so that those looking at the OEE don’t have to look into the particulars. It’s alluring, because you only have to look at a single number to supposedly see how the process is doing. It’s a quick and easy-to-digest value that people don’t have to think too much about and it comes with the added benefit that it kinda, somehow has a similarity with calculating the efficiency of a system (though the last E stands for Effectiveness, probably because the word “efficiency” has negative connotations with redundancies, etc.)

But wait… Production systems and processes are usually inherently complex; should we pretend that they are simple, or (worse) oversimplify them through the introduction of oversimplifying “metrics”, especially when those have well-known issues? My view is that we should not. The use of OEE does not balance simplicity and practicality in any way. It oversimplifies and obscures, preventing nuanced views into the things that actually matter.

Heck, in reality even the measurement and calculation of the A/P/Q metrics in isolation, without looking at the underlying system through “Kaizen”, is yet another fixation on numbers and metrics that can prevent people from improving things in significant, far-reaching ways.

When you compare an OEE of 0.51 to an OEE of 0.52 what does this tell you? As we saw earlier, it doesn’t tell you much and can even blind you to the reality of the process.

It only tells you that one of the A/P/Q numbers must have changed. But which one, and to what extent, in which direction?

Unpack the OEE into its components and find out. Now you’re back to the actual metrics, so you could just as well track each of the three metrics separately and instantly know where issues might lie without pretending to describe complex, dynamic processes and systems with a single obfuscating number.

Focusing on single metrics like “OEE” is yet another type of what I call Deft .

Here’s the original repost that inspired this writing, and the original post that drove me to respond.


For the geeks among us

Here’s some Python code I wrote to explore this that calculates the worst-case scenarios of small differences between OEE values arising from very different A/P/Q combinations:

from itertools import product, chain
from math import sqrt

r = [q/10 for q in range(5, 10+1)]

def encode_key(q):
    return "x".join([str(qq).replace('.', 'p') for qq in q])

def decode_key(k):
    return [float(kk.replace('p', '.')) for kk in k.split("x")]

def distance(a, b):
    aa, bb = [decode_key(x) for x in [a, b]]
    return sqrt(sum([ (aa[k] - bb[k])**2 for k in range(len(aa)) ]))

def by_value(m, v):
    return [decode_key(k) for k in m.keys() if m[k] == v]

def worst_case(m, v1, v2):
    q1, q2 = [by_value(m, v) for v in [v1, v2]]
    pd = [[*p, distance(*[encode_key(q) for q in p])] for p in product(q1, q2)]
    maxdist = max([ p[-1] for p in pd ])
    candidates = [ p for p in pd if p[-1] == maxdist]
    return candidates[-1]

rd = { encode_key([a,p,q]): round(a * p * q, 3) for a, p, q in product(r, repeat=3)}

vals = sorted(set(rd.values()))

rkd = [ worst_case(rd, vals[k], vals[k+1]) for k in range(len(vals)-1)]
scores = sorted([p[-1] for p in rkd], reverse=True)
ranked = [ [p for p in rkd if p[-1] == s] for s in scores]
ranked = list(chain.from_iterable(ranked))

filtered = []
for p in ranked:
    if p not in filtered:
        filtered.append(p)

print("[ A ,  P ,  Q ]_1", "\t", "[ A ,  P ,  Q ]_2", "\t", "Euclidean distance", "\t", "OEE_1", "\t", "OEE_2")
print("--------------------------------------------------------------------------------------")
for p in filtered:
  print(p[0], "\t", p[1], "\t", p[-1], "\t", rd[encode_key(p[0])], "\t", rd[encode_key(p[1])])