ITIL (Information Technology Infrastructure Library) is a detailed set of IT service management practices that focuses on providing a framework of best practices for organizations. While ITIL has been around since the 1980s, there is still a lot of confusion about the difference between incident management and problem management, as well as where one stops and the other begins. (Find out more about ITIL here.)
Now confusion between two terms and definitions wouldn’t normally be such a big deal, but not being familiar with the differences between these two processes can end up having a huge negative impact on not only your infrastructure, but your business as a whole.
What is an Incident?
According to ITIL, “an incident is an unplanned interruption to a service, or the failure of a component of a service that hasn’t yet impacted service”. In order to be considered an incident, it must cause a disruption in service and it has to be unplanned. Servers crashing after-hours that are only used during the day and scheduled maintenance, then, are not categorized as incidents as they do not directly interrupt the business process. Incidents need to be resolved immediately whether it is by a permanent fix, a workaround, or a temporary fix.
What is a Problem?
Also according to ITIL, “a problem is a cause of one or more incidents”. This problem is initially unknown and results from a number of incidents that are related and have common issues. While problems are not classified as incidents, incidents can raise problems, especially if they may or do happen repeatedly. To refer to our above example, the situation of the server that is only used during the day crashing after office hours is a problem because although it isn’t currently causing a disruption in service, it could happen again and become an incident.
What is Incident Management?
The main goal of incident management is to resolve the disruption as soon as possible in order to restore service operations. Due to the fact that even minor disruptions in service can have a huge impact on the organization, it is necessary to fix incidents immediately. The process of incident management usually includes recording the details of the incident and resolving it.
Incident management often involves level one supports, which include:
- Incident identification
- Incident logging
- Incident categorization
- Incident prioritization
- Initial diagnosis
- Escalation, as necessary, to level 2 support
- Incident resolution
- Incident closure
- Communication with the user community throughout the life of the incident
What is Problem Management?
The goal of problem management is to identify the root cause of the incidents and try to prevent them from happening again. It might take multiple incidents before problem management can have enough data to analyze what is going wrong, but if undertaken correctly, it will help the problem become a “known error” and steps can be put in place to correct it. While incidents like a malfunctioning mouse may not result in a problem, those like repeated network outages need to be investigated.
Sometimes problem management is referred to as a reactive process that begins only after incidents have occurred. In actuality, problem management should be thought of as a proactive process because its end goal is to identify the problem, fix it, and prevent it from ever happening again. So, you could say the main goal of problem management is to identify the problem, troubleshoot it, document the issue as well as the causes of it, and then ultimately resolve it.
Problem management has a very limited scope and includes the following activities:
- Problem detection
- Problem logging
- Problem categorization
- Problem prioritization
- Problem investigation and diagnosis
- Creating a known error record
- Problem resolution and closure
- Major problem review
To bring it all together, let’s look at an analogy comparing incident management and problem management.
A Tale of Two Citizens
Incident management is like a firefighter at a house fire: it swoops in, immediately fixes the problem, and saves the day. Firefighters come to the scene and notice the issue, and work fast to put out the fire as quickly as possible without stopping to question how it started. This is a similar situation for incident management. While it is necessary for incident management to provide fast results and repair issues within the infrastructure, it doesn’t help us find out what ultimately went wrong and why there was an issue in the first place. That’s where problem management comes in.
Problem management is like the detective that comes into the picture after the fact. They weren’t there to put out the flames themselves, but they can still investigate what went wrong, figure out how the fire started, and help educate people to take preventative steps so something similar doesn’t happen again. Problem management is a vital piece of the puzzle as it addresses the root cause of the incidents and proactively prevents them from repeating and potentially causing major issues in the future. Without taking time to review incidents and problem solve, they will just continue to happen and potentially increase in seriousness.
Understanding the difference between incident management and problem management, and having dedicated managers for each separate scenario, ensures that you are not just putting out fires all day. While immediately fixing problems in the infrastructure with incident management provides temporary relief, it will soon exhaust your resources and employees without finding the root of the problem. Bringing in problem management helps to investigate the cause of the incidents and puts steps in place so it doesn’t continue to occur. By having a specific manager or team for this process, you will be one step closer to decreasing the rates of incidents in your organization and preventing major outages and service disruptions.
Resources from BMC
BMC provides both an ITIL Problem Management Guide and an ITIL Incident Management Guide to keep you informed of the latest version of ITIL and its important elements so you can quickly understand key changes and concepts. Both of these booklets also include commentary and examples from top BMC experts in order to take your exploration and understandings one step further. These insights can help you fully deconstruct concepts and make them actionable for your organization.
BMC also offers services to help your organization reduce the number of incidents handled, improve resolution times, and prevent future incidents. Remedy Incident and Problem Management 9 is based on the latest ITIL best practices and provides vital connections between IT infrastructure and business services. It will help you:
- Integrate all IT service support functions, including change, asset, service-level, service-request, identity, and knowledge management
- Achieve lower call volumes with business user self-service capabilities of BMC Digital Workplace (formerly MyIT)
- Gain direct visibility into business priorities through integration with a single CMDB
- Align to ITIL best practices quickly and cost-effectively with expert services, comprehensive training, and out-of-the-box ITIL processes
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.