2014

MTBF > MTBM

EP Editorial Staff | September 3, 2014

0914f6top

What happens when the maintenance-manager turnaround period is substantially less than the overall departmental mean time between failure?

By Randall K. Noon, P.E.

Most, if not all, readers of this publication will probably recognize that the term “MTBF“ as shorthand for “Mean Time Between Failures.”  My dog-eared 3rd edition of the Quality Control Handbook edited by J. Juran defines MTBF as the “mean time between successive failures of a repairable product.”

A quick example of MTBF is a bearing failure in a run-to-failure electric motor. If either of the two motor bearings has successively failed in service after 4 years, 6 years and then 4.5 years, the mean or average time between those failures is 4.83 years. To avoid having to deal with all the problems associated with an unexpected failure (which typically occurs on a holiday weekend, in the dead of night, during a blizzard or when there’s a “your job is on the line” deadline to be met) a prudent maintenance professional might schedule a preventive maintenance (PM) task to replace the motor’s bearings every 42 months or so—just to be sure. Bearings, of course, are cheaper than unscheduled down time and zero productivity.

The other term in the title, “MTBM,” is not a readily recognizable abbreviation. That’s because I made it up. It stands for “Mean Time Between Managers.”

Analogous to the MTBF definition, MTBM can be defined as the “mean time between successive managers of an administrative unit.” Accordingly, this cryptic title begs the question: “What happens when the mean time between failures in critical equipment or a department is greater than the mean time between its managers? The answer is contained in the following tale.

Once upon a time

The Acme Company used to have a first-class maintenance department. Over time, John, the longstanding maintenance manager, had assiduously put into place an integrated asset-management system. It included condition monitoring (whose sample periods changed as the component’s risk changed); preventive-maintenance replacement of critical components at a 99.9% reliability-confidence level; a receipt-inspection program; a vigorous failure-investigation program; and incorporation of manufacturers’ recommendations into regular maintenance activities. Under John’s watchful eye, the place ran like a Swiss watch. Productivity was second to none. Maintenance workers had regular hours. After many years, however, John deservedly retired.

During the search process for John’s replacement, Jim told interviewers that while John had run a fine program, it was an old-school effort with plenty of fat in its budget. Jim indicated that maintenance costs could be cut significantly—without problems. As evidence, he pointed to the fact that no unscheduled downtime had occurred at the Acme site in years. Jim got the job. And he kept his promise.

The Acme Company’s maintenance budget was significantly cut, and nothing immediately happened. Condition-monitoring sampling periods were lengthened; parts were used longer and serviced less; preventive maintenance was deferred; the threshold for investigating failures was raised; and the company’s bottom line improved. Senior managers were duly impressed. After three years, Jim was promoted, and Jack became the new maintenance manager.

Although Jack was not very experienced in the area of machinery maintenance, he did have an MBA and could understand a balance sheet and write project proposals with the best of them. Granted, during the first six months of his tenure, a few small failures did surface. Not to worry: Jack managed them well and got the plant back on line quickly. While productivity actually dropped a bit, it was forgiven. After all, Acme’s management reasoned, Jack was still new on the job. Besides, as everyone in the know liked to say, Jack’s immediate predecessor Jim had set an extremely high bar.

Alas, during the following nine months, the bottom fell out from under Jack. One critical failure after another struck the plant. Jack seemed to be in crisis mode 24/7. The meetings that he was able to attend were tough on him: He seemed so distracted, which was understandable given the fact that he was constantly on his cell phone organizing work crews and looking for parts on short notice.

Due to the increase in unscheduled work time, maintenance personnel began to grumble about lost weekends and 12-hour shifts. Productivity dropped. Maintenance costs increased—as did the accident rate. In meetings where Jack wasn’t present, Jim frequently was heard to mutter, “These problems didn’t happen when I was maintenance manager.”

Eventually, Jack’s boss invited him in for a closed-door, one-on-one session—during which he mentioned that things “just weren’t working out.” Consequently, Jack was advised to transfer to another department or, perhaps, seek employment elsewhere. In the meantime, senior management was busy interviewing Jason.

Jason was impressive. He told his interviewers that the maintenance program at the Acme Company needed to be re-organized from the ground up, and that the company needed to step up its investment in maintenance to improve productivity at a three-to-one benefit-to-cost ratio. But, oh by the way, he also let it be known that the process would take some time to fully implement—perhaps as long as three years. Jason was hired.

The moral to this story

The answer to the question posed in the title is this: When the maintenance manager turnaround period—MTBM—is substantially less than the overall departmental MTBF, there’s a temptation to cannibalize the established maintenance margin (i.e., the future hedge against un-scheduled failure) for the purpose of a short-term gain.

While the short-term manager doesn’t face the consequences of a cannibalized margin, his/her successor will pay dearly. Shakespeare summed up this concept in blunter, albeit more poetic, terms, in the play, Julius Caesar. As the character Marc Antony put it, “The evil that men do lives after them; the good is oft interred with their bones.”  MT

Randy Noon is a Root Cause Team Leader at Nebraska’s Cooper Nuclear Station. A noted author and frequent contributor to MT, he has investigated failures for more than three decades. Contact him at rknoon@nppd.com.

FEATURED VIDEO

Sign up for insights, trends, & developments in
  • Machinery Solutions
  • Maintenance & Reliability Solutions
  • Energy Efficiency
Return to top