Mean Time Between Failure (MTBF) is a reliability metric that measures the average time between failures, which helps inform an asset’s reliability. 

This section of Reliability 101 covers MTBF, including how to calculate this metric, improve it, and use this data to develop KPIs. 

Key Takeaways

  • Mean Time Between Failure (MTBF) measures the likelihood of an equipment or component failure within a time frame.
  • A high MTBF can mean fewer problems and costs for your equipment; a lower one could mean more frequent failures and more expenses.
  • MTBF metrics can be improved through process improvements, data standardization, and identifying the root cause of failures.
  • MTBF can inform Key Performance Indicators, including budgeting and CapEx investment decisions.
  • There are challenges in capturing a clean MTBF calculation, including how a “failure” is defined.

Reliability Metrics in context

Reliability metrics provide operations management with valuable information about the performance of various aspects of an operation. They are a means for comparing current site practices against industry standards and help to find areas where an organization can improve processes and operational efficiency.

A complete picture of an asset’s reliability and potential failure modes is conducive to making decisions that contribute to your business strategy, capital investment projects, product development plans, and operational policies.

A commonly used metric used to measure asset reliability is using its Failure Metrics, which include Mean Time Between Failure (MTBF), Mean Time to Failure (MTTF), and Mean Time to Repair (MTTR).

This article provides an overview of MTBF, one of the most commonly used reliability metrics. It outlines how to use this knowledge to calculate, improve, and use these metrics for building KPIs.

What is Mean Time Between Failure (MTBF)?

Mean Time Between Failure (MTBF) measures the likelihood of a piece of equipment or component failure within a period, which allows one to quantify its reliability.

graphic illustrating reliability failure metrics, including MTTF, MTFB, MTTR

Visualizing Failure Metrics

For many organizations, knowing MTBF metrics helps assess the reliability of the systems that support your business operations. Many companies aim to maximize output and minimize downtime during regular operating hours.

How to calculate MTBF

The following formula is used to calculate MTBF:

Total Operational Time / Total Number of Breakdowns = MTBF (hrs.)

Total operational time is the period your equipment runs without any breakdowns. Typically this is measured in operating hours. It includes both planned maintenance tasks and unplanned repair time.

Total breakdowns are the number of times the equipment has failed while running. It may include any failure, including mechanical, electrical, software, or human error.

What the results mean 

A high MTBF output means fewer problems with your equipment will occur over its lifetime. This translates into lower costs associated with repairs and unplanned downtime.

A lower MTBF output means you will likely experience a more frequent failure rate. It helps to plan around this so that when a failure does happen, you can respond with the correct asset management strategy.

The goal should be to have a high average time between failures, indicating good health. The appropriate number should be considered case-by-case and varies depending on the asset, use case, environment, and the maintenance program that is in place.

How to Improve MTBF

Many factors affect your MTBF calculation, such as age, operating conditions, and usage patterns. However, the calculation can be improved by reviewing the systems and making the appropriate changes.

Process Improvements

One method to improve MTBF is through process improvements. Processes like continuous monitoring, preventive maintenance, and regular testing can ensure that an asset still is reliable throughout its lifecycle.

Preventive Maintenance

Preventive maintenance programs can help to avoid costly downtime by reducing the risk of failure and increasing the reliability of an asset, thereby increasing the average time between failures.

Reliability Centered Maintenance, or RCM, is a reliability methodology that develops asset management strategies that help ensure equipment is available when needed, especially critical equipment. It includes examining how a piece of equipment operates and how the equipment is designed. This knowledge helps to understand why equipment fails and how to optimize the repair process.

RCM also gathers information about critical assets and which ones may require more attention than others to create a plan of action.

Maintenance Data Standardization

Standardizing maintenance data provides reliable, accurate, and timely information on an asset’s performance to improve operational and strategic decision-making. This is critical for an asset care strategy. Data standardization allows you to compare assets across multiple sites, locations, and periods. This enables organizations to make more informed decisions about their overall asset portfolio.

Data standardization also helps improve the communication between different departments within an organization. If all employees use standardized data, they understand how their work relates to others’ efforts, and they can easily see where gaps exist and collaborate to close them.

Identify the cause of failures

When a machine does experience failure, an Root Cause Analysis study can help find the root of failures and develop solutions to prevent them from happening again. The Five Whys is an effective method for discovering the root cause of problems and developing long-term solutions to prevent them from recurring.

How MTBF informs KPIs

Since MTBF is a measure of a system’s reliability, it can be used in various important business decisions, including KPIs, which are helpful to know how well your company is doing and how to improve future performance.

Budgeting

MTBF metrics allow maintenance teams to make informed decisions by giving a quantitative estimate of when and how often an asset is expected to fail before it does. This helps with budgeting for a replacement/upgrade as it helps determine when it will reach its end of life and need replacing.

Prioritizing Maintenance Activities

MTBF metrics can help decide the suitable timeframe for equipment downtime for maintenance activities. Further, it can help to design benchmarks against which to measure progress. Combined with a proactive maintenance strategy, MTBF metrics are used to prioritize activities based on criticality.

For example, if the MTBF of an asset has improved due to optimized maintenance activities, it provides a measurable progress milestone.

Capex Investment Decisions

Capex investments are made based on expected future revenue streams. Knowing a new asset’s failure will help get a clearer picture of its total cost of ownership and return on investment.

This knowledge helps determine whether the investment is worth it because there could be another option that provides a more significant ROI. Further, knowing the cost per hour of operation helps inform decisions regarding whether to replace aging equipment.

Quality Assurance in Manufacturing

MTBF is used as a measure of quality assurance in manufacturing processes.

For example, if a component fails after only three months, this may indicate inadequate quality control during manufacturing. It could mean that something was wrong with the design process, materials were not up to spec, or some other problem occurred at the factory.

The same thing applies to components that have been operating well but suddenly start failing.

Challenges in capturing MTBF

To find the correct calculation for MTBF, you must account for variances that can affect the data quality and potentially compromise the validity of the information.

Variances in data collection

The MTBF rate can vary depending on the equipment that is being measured. In addition, the MTBF rates will change based on the equipment’s environment.

Poor Data Tracking

To measure the effectiveness of these strategies, we need reliable data when things go wrong to know what is going right. This requires tracking all breakdowns, not just those caused by hardware issues such as broken parts or worn-out bearings.

It also includes incidents involving people making mistakes like forgetting to turn off machinery or not following safety procedures.

These errors may seem minor, but they add up over time if you do not keep careful records.

Incomplete maintenance records

If a company does not keep track of its history, it cannot determine whether a particular part has had any issues.

The current procedures may only keep track of significant events such as repairs or replacements, so the actual number of incidents occurring over a period cannot be accurately determined.

 Varying definitions of “failure”

The definition of failure is often open to interpretation. Some organizations define asset failure as any incident that results in lost production, while others consider anything less than 100% uptime acceptable.

In addition, many manufacturers exclude specific categories of events from being considered part of the total number of failures because they believe they do not affect the product’s reliability.

To calculate the actual value of an asset, you must include every type of event that affects its availability.

Going further than reliability metrics

With the many advantages of knowing the MTBF of your assets also come challenges. One common challenge is knowing that the data within your organization is clean and can drive the entire organization to make the right decisions that can impact your bottom line.

MaxGrip helps asset-intensive organizations improve their maintenance and reliability practices by providing them with the blueprint to connect the dots between maintenance metrics and tangible business outcomes.

What does that produce? More effective decision-making, more productivity, and, most of all, more profitability.

Learn more about our data-driven approach to boosting your asset performance here.

 

Share this article:

🖥 Watch On-Demand: Foundational CMMS data

Get inspired

CTO

MaxGrip has appointed Mark Mulder in the role of CTO. This will strengthen the independent consultancy portfolio and expand partnerships.

maintenance workers looking at equipment

Though condition-based maintenance and predictive maintenance have some overlap, they are not technically the same.

Webinar asset performance management

Join our online panel discussion to get insights on how to navigate during the COVID 19 lockdown.