Using MTTF vs. MTBF for Your Operator Driven Reliability Strategy

July 5, 2018

Nobody wants to fail, and nobody wants to be failed—by their team, by their information, or by their equipment. That’s why, as a Plant Operations Manager, it’s your job to predict failure, and then prevent it.

When a large asset goes down, it can cost your plant hundreds of thousands of dollars an hour. To be able to detect when an asset may be nearing failure—and to take actionable steps in order to prevent it—you need to rely on your Field Operators and the data they collect from their rounds. In other words, understanding both an asset’s mean time to failure, as well as mean time between failure, i.e. MTTF vs. MTBF, is the key to developing a sound strategy for plant asset reliability management.

How Understanding MTTF vs. MTBF Informs Your Reliability Strategy

When your equipment fails, it costs your plant big money. Get proactive about knowing when your equipment is expected to fail; look at its mean time to failure (MTTF) and mean time between failure (MTBF). Knowing this information will allow you to effectively plan your maintenance and repairs—and prolong the working life of your high-dollar assets.

Repairable Failure vs. Total Failure

There are two types of asset failure to be concerned with: repairable failure and total failure. Repairable failure describes an equipment breakdown which can be corrected with an acceptable amount of maintenance time and resources. Total failure describes an equipment breakdown which cannot be repaired within an acceptable timeframe or budget—i.e. the asset’s point of no return.

Two key figures quantify these types of failure. Mean time between failures (MTBF) enumerates how long, on average, your equipment will function between repairable failures. Mean time to failure (MTTF) relates to total failure, quantifying how long your equipment will function until it breaks down and cannot be repaired. In other words, MTBF is the average time before repairable failure, while MTTF is the average time before total failure.

These figures are dynamic, and respond to actions you can take. Your MTBF and MTTF can each be lengthened by performing timely maintenance. Conversely, they can be shortened by improper equipment usage or faulty maintenance.

The Separate Costs of Repairable and Total Failure

As an Operations Manager, your goal is to prevent equipment failure in order to reduce your plant’s operating costs. As your operating costs lower, the percentage of your revenue that can be claimed as profit will rise. Knowing the costs of equipment failure allows you to determine how many resources you should dedicate to prevention, repair, or replacement.

It is crucial to keep in mind that repairable and total failure have separate costs. The cost of repairable failure, as predicted by your MTBF, is the cost of downtime (with will vary based on duration), plus the cost of repair.

Cost of Repairable Failure = Downtime + Repair

The cost of total failure, as predicted by your MTTF, is downtime plus the cost of replacement. Total failure will typically be more expensive than repairable failure. Downtime will likely be longer, and thus more expensive, and replacement will likely also cost more than repair.

Cost of Total Failure = Downtime + Replacement

While replacement costs are often significantly more than a repair, sometimes it is actually less expensive to replace an asset than to repair one—especially if the replacement equipment would be more efficient, yielding lower Total Cost of Ownership (TCO).

If Cost of Repairable Failure < Cost of Total Failure: Repair

If Cost of Total Failure < Cost of Repairable Failure: Replace

Maintaining Large Assets Using MTTF and MTBF

Large assets fail a part at a time. Each of your assets is a combination of multiple smaller parts and systems. If you know the MTBF and MTTF of each of these smaller systems, you can leverage your asset performance management process accordingly.

Your maintenance teams should examine, refurbish, and replace the smaller parts and systems in your equipment prior to their expected failure date. Doing so will “reset” the failure date of your asset, pushing it back months or even years to the expected failure date of the total asset or piece of equipment.

The Ship of Theseus and Asset Reliability Management

Towards the end of their expected lifespan, your assets may resemble the fabled “Ship of Theseus.” Imagine a ship, belonging to the mythical Greek character Theseus, that is maintained after his death in a museum for centuries.. As components of the wooden ship rot, bit by bit, those components are replaced by museum caretakers until no original components remain. The ship remains in good condition, and presumably even seaworthy—but is it still the ship of Theseus?

For Plant Operations Managers, that’s a moot point. What matters is that the asset is still intact and functioning as expected. Even if it has a short MTBF, leading to repairable failure, it may have an infinite MTTF, meaning it has never totally failed. If you’re able to prolong the life expectancy of a large asset by replacing each part before it fails, that’s a victory. If Theseus came back and saw his ship, would he care if it was built out of the original wood, or would he simply care that it floats and sails?

How Operators and Information Extend Your Assets’ MTBF and MTTF

Leaving Theseus and infinite MTTF behind, we can use games of chance to explain how to extend both MTBF and MTTF. When you depend on the mean, or average, time to failure, you may find yourself unpleasantly surprised by when your equipment actually fails. It’s like Blackjack: sometimes when you hit on sixteen, you get lucky; sometimes you bust. Either way, you’ll have to be ready to cover the bet when the card flips.

Knowing your MTBF and MTTF is a bit like counting cards: you’ll know when trouble’s more likely to arrive. Likewise, having mobile workers making rounds is like having x-ray specs. You’ll be able to see the cards—upcoming problems with your assets—while they’re still in the deck, and take appropriate action accordingly.

Operator Driven Reliability Management

Many subtle warning signs of asset failure can only be picked up by the eyes, ears, and nose of an observant Plant Operator. Relying on MTTF and MTBF alone will not accurately predict specific failures. Using a combination of instrumentation and mobile workers making rounds of your assets at routine, scheduled intervals is the best way to keep a watchful eye on your equipment.

While the instruments pick up quantifiable data, like temperature, oil pressure, and RPMs, to help you detect fluctuations from norm, your Field Operators detect anything that can’t be easily instrumented, such as spotting corrosion, leaks, grime, cracks, dents, animal intrusion, grinding sounds, or unusual smells.

If an Operator detects an issue while they’re out on Rounds, they will be able to inform their supervisor. Then, you’ll be able to determine what the appropriate response is, whether it’s continued monitoring, scheduled repair, emergency repair, or evacuation.

Having your equipment continuously observed by trained Operators increases your ability to proactively schedule repairs in the most time and cost-effective manner possible, and decreases your need for emergency repairs or more drastic measures, like replacement. It’s an essential augmentation to automated instrumentation.

Preventing Operator and Information Failure

MTTF and MTBF are only useful when supported by vigilant Operator Rounds and accurate information. To ensure that your Field Operators are making their rounds correctly and in a timely manner, it’s important to train them properly, and treat them in a way which inspires motivation. Particularly for critical functions, it can also be helpful to implement procedures which ensure round accountability. For instance, if you use a mobile asset round sheet, you can require your Operators to scan a QR code, barcode, NFC tag, or RFID on each essential asset they visit. This will give you an accurate record of your Operator’s time and movement.

Mobile asset round sheets are also useful for preventing information failure. Even if your Operators dutifully visit your assets, inspecting and recording observations on paper, you can still be surprised by unreported conditions which lead to equipment failure. While they may have successfully collected actionable information, unless you manually enter the data or review paper log sheets continuously, it hasn’t been successfully passed on.

In contrast, mobile Round sheets automatically send data to your central IT system where it’s compiled, can be recorded in equipment maintenance logs, trigger work orders in your CMMS, archived in your Plant Historian, and any anomalies or action items can be highlighted. It’s like you’re walking the entire plant yourself, at all times.

The foundation that plant reliability rests on is predicting and preventing equipment failure. Predicting failure is accomplished by obtaining the MTBF and MTTF for your assets and their internal components. You can prevent failure by repairing or replacing the components within your assets before they break down.

MTBF and MTTF will help you establish a maintenance calendar. But remember, because failure doesn’t follow a calendar, you should also have Operators making rounds to detect any potential unscheduled failures before they happen. Do all this, and your plant may become a ship of Theseus itself—always being repaired, never truly failing.

Plant Operations Managers depend on timely information to keep their plant up and running. We developed GoPlant to help. GoPlant, an asset-centric mobile round sheet application for Field Operators, gets Plant Operators Managers the data they need to make smart, timely decisions via actionable data. To see GoPlant in action, request a demo or contact our team today.