MTBF vs. MTTF vs. MTTR: Defining failure for IT and data center environments

BY

Photograph depicts a hacker extracting encrypted data from a computer.

In the IT management world, there’s always been confusion between the terms Mean time between failures (MTBF), Mean time to failure (MTTF), and Mean time to repair (MTTR), and how they relate to product reliability. Let’s take a few minutes today and look at these terms and how you would use them for data center management in an IT service management environment (ITSM).

Here are the basic definitions for each term and how they are derived.

Mean time between failures (MTBF) is a prediction based on prior observations, designating a product’s average time between failures. An MTBF value can be defined by the following equation:

        MTBF = total operational uptime between failures / number of failures

To illustrate, let’s assume a manufacturer has recorded the following data points between product failures for one of its copier models:

Failure number Recorded Operational  uptime before  failure (in hours)
1 10,000
2 9,500
3 11,000
4 9,000
Total 39,500

The MTBF would be 9,875 hours ((10,000 + 9,500 + 11,000 + 9,000) / 4), meaning that on average, this copier can run for 9,875 hours before experiencing a failure.

Meaningful MTBF calculations obviously require many more observable data points. The more failure data points observed for a product, the more accurate its MTBF. The MTBF for a product that has many subsystems (such as a server with disk drives, fans, motherboard, etc.) also requires a much more complicated MTBF calculation. But for our purposes, this simple example illustrates the idea behind an MTBF, which is:

        MTBF predicts the average time between product failures

MTBF is commonly used to designate failure rates for both repairable and replaceable (non-repairable) products.

Mean Time to Failure (MTTF) also predicts failure rates for a product. Unlike MTBF items, MTTFs are only used to designate failure rates for replaceable (non-repairable) products, such as keyboards, mice, batteries, desk telephones, and motherboards. MTTF formulas generally use the same equations as for an MTBF product, but they only record one data point for each failed item.

MTTF measurements can refer to two specific types of replaceable products:

  1. Replaceable products that can’t be repaired. The product’s first failure is its only failure and it must be replaced.

    A simple example of a replaceable MTTF item is a computer mouse, which are always replaced and never repaired. MTTFs for replaceable items provide an idea of how many replacements should be in stock and the turnover you might expect on that item.  The same holds true for other replaceable items, such as keyboards and desk telephones. It’s also worth noting that some network appliances are replaceable, rather than repairable. Certain firewalls, switches, modems, and other key networking equipment are sealed units that run for years and are replaced rather than repaired when they fail.
  1. MTTF may also refer to the first failure rate for a replaceable subsystem inside a repairable In this scenario, the failure occurs inside another product and the repair is to replace the failed subsystem.

    If you have a UPS in your Data Center for example, its key replaceable components are batteries. The UPS unit itself may have an exceptionally long MTBF of up to ten years, but each UPS battery may only have an MTTF of 3 years. Given this, you should budget to replace all UPS batteries before their MTTF expires (say every 2.5 years). Here, the MTTF provides a budgeting function.  A similar situation exists for Data Center air conditioners and generators, which also have replaceable components that can fail.


    The most popular computer replaceable items are hard drives, which fail with age. If you know the MTTF of your disk drives, you can determine an appropriate number of replacement drives to have available, in case of failure.

 

Mean time to repair (MTTR) represents the average time to repair or replace a failed product or subsystem of a product.

MTBF items are generally repairable products.  Repair occurs as quickly as possible, possibly by replacing a subsystem component. An MTBF product’s MTTR value is a discrete measurable number for how long it will take to get the product running again.

Disk drives, fans, motherboards, and other replaceable items that reside in a computing device are MTTF items that are always replaced and seldom repaired. The same goes for stand-alone replaceable products such as mice, keyboards, batteries, telephones, and sealed network equipment. MTTF products have no MTTR because they cannot be fixed, only replaced.

Practical MTBF and MTTF

Even though an MTBF value can be used for repairable or replaceable products, it’s best to only designate repairable products as MTBF items. MTBF items are longer lived products such as servers, computers, air conditioners, UPS systems, generators, etc. MTBF products are usually capital items that must be budgeted for when you are ready to upgrade or replace.

MTTF products are generally parts for MTBF items as well as inexpensive lower-end components, such as mice, keyboards, batteries, motherboards, etc. They are usually expensed items. Using MTTF values can help you decide how many replacement items of each product you need to stock as they wear out.

Measuring your MTBF, MTTF, and MTTR

Purchasing software that tracks MTBF, MTTF, and MTTR history by individual product in your data center can help improve your data center and service desk performance. Many packages offer reports that detail the individual failure rates and repair cycles of your MTBF and MTTF products. These reports can help improve performance in several areas, including creating product baselines to measure improvement; predicting failures before they occur; and reducing MTTR times and increasing MTBF and MTTF times. Please feel free to contact us at BMC for more information on using MTBF, MTTF, and MTTR to improve performance in your own environment.

Related posts:

Free Download: 2017 Gartner Magic Quadrant for ITSM


Get the detailed analysis and insight you need to make the best ITSM choice for your organization and deliver the digital services your business needs, more quickly and efficiently than ever.

Download Now ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post


Joe Hertvik

Joe Hertvik

Joe Hertvik works in the tech industry as a business owner and an IT Director, specializing in Data Center infrastructure management and IBM i management. Joe owns Hertvik Business Services, a content strategy business that produces white papers, case studies, and other content for the tech industry. Joe has produced over 1,000 articles and other IT-related content for various publications and tech companies over the last 15 years. Joe also provides consulting services for IBM i shops, Data Centers, and Help Desks. Joe can be reached via email at joe@joehertvik.com, or on his web site at joehertvik.com.