In the IT management world, there’s always been confusion between the terms Mean time between failures (MTBF), Mean time to failure (MTTF), and Mean time to repair (MTTR), and how they relate to product reliability. Let’s take a few minutes today and look at these terms and how you would use them for data center management in an IT service management environment (ITSM).
Here are the basic definitions for each term and how they are derived.
Mean time between failures (MTBF) is a prediction based on prior observations, designating a product’s average time between failures. An MTBF value can be defined by the following equation:
MTBF = total operational uptime between failures / number of failures
To illustrate, let’s assume a manufacturer has recorded the following data points between product failures for one of its copier models:
|Failure number||Recorded Operational uptime before failure (in hours)|
The MTBF would be 9,875 hours ((10,000 + 9,500 + 11,000 + 9,000) / 4), meaning that on average, this copier can run for 9,875 hours before experiencing a failure.
Meaningful MTBF calculations obviously require many more observable data points. The more failure data points observed for a product, the more accurate its MTBF. The MTBF for a product that has many subsystems (such as a server with disk drives, fans, motherboard, etc.) also requires a much more complicated MTBF calculation. But for our purposes, this simple example illustrates the idea behind an MTBF, which is:
MTBF predicts the average time between product failures
MTBF is commonly used to designate failure rates for both repairable and replaceable (non-repairable) products.
Mean Time to Failure (MTTF) also predicts failure rates for a product. Unlike MTBF items, MTTFs are only used to designate failure rates for replaceable (non-repairable) products, such as keyboards, mice, batteries, desk telephones, and motherboards. MTTF formulas generally use the same equations as for an MTBF product, but they only record one data point for each failed item.
MTTF measurements can refer to two specific types of replaceable products:
- Replaceable products that can’t be repaired. The product’s first failure is its only failure and it must be replaced.
A simple example of a replaceable MTTF item is a computer mouse, which are always replaced and never repaired. MTTFs for replaceable items provide an idea of how many replacements should be in stock and the turnover you might expect on that item. The same holds true for other replaceable items, such as keyboards and desk telephones. It’s also worth noting that some network appliances are replaceable, rather than repairable. Certain firewalls, switches, modems, and other key networking equipment are sealed units that run for years and are replaced rather than repaired when they fail.
- MTTF may also refer to the first failure rate for a replaceable subsystem inside a repairable In this scenario, the failure occurs inside another product and the repair is to replace the failed subsystem.
If you have a UPS in your Data Center for example, its key replaceable components are batteries. The UPS unit itself may have an exceptionally long MTBF of up to ten years, but each UPS battery may only have an MTTF of 3 years. Given this, you should budget to replace all UPS batteries before their MTTF expires (say every 2.5 years). Here, the MTTF provides a budgeting function. A similar situation exists for Data Center air conditioners and generators, which also have replaceable components that can fail.
The most popular computer replaceable items are hard drives, which fail with age. If you know the MTTF of your disk drives, you can determine an appropriate number of replacement drives to have available, in case of failure.
Mean time to repair (MTTR) represents the average time to repair or replace a failed product or subsystem of a product.
MTBF items are generally repairable products. Repair occurs as quickly as possible, possibly by replacing a subsystem component. An MTBF product’s MTTR value is a discrete measurable number for how long it will take to get the product running again.
Disk drives, fans, motherboards, and other replaceable items that reside in a computing device are MTTF items that are always replaced and seldom repaired. The same goes for stand-alone replaceable products such as mice, keyboards, batteries, telephones, and sealed network equipment. MTTF products have no MTTR because they cannot be fixed, only replaced.
Practical MTBF and MTTF
Even though an MTBF value can be used for repairable or replaceable products, it’s best to only designate repairable products as MTBF items. MTBF items are longer lived products such as servers, computers, air conditioners, UPS systems, generators, etc. MTBF products are usually capital items that must be budgeted for when you are ready to upgrade or replace.
MTTF products are generally parts for MTBF items as well as inexpensive lower-end components, such as mice, keyboards, batteries, motherboards, etc. They are usually expensed items. Using MTTF values can help you decide how many replacement items of each product you need to stock as they wear out.
Measuring your MTBF, MTTF, and MTTR
Purchasing software that tracks MTBF, MTTF, and MTTR history by individual product in your data center can help improve your data center and service desk performance. Many packages offer reports that detail the individual failure rates and repair cycles of your MTBF and MTTF products. These reports can help improve performance in several areas, including creating product baselines to measure improvement; predicting failures before they occur; and reducing MTTR times and increasing MTBF and MTTF times. Please feel free to contact us at BMC for more information on using MTBF, MTTF, and MTTR to improve performance in your own environment.
- COBIT vs ITIL: Understanding IT Governance Frameworks
- Business Benefits of IT Service Management
- Help Desk vs Service Desk: What’s The Difference?
- IT Support Levels Clearly Explained: L1, L2, L3, and more
- ITIL V2 vs ITIL V3: What’s the Difference?