Business organizations of all sizes consume IT solutions, either as-a-service or purchased from third-party vendors. This technology delivery model is designed so that businesses can access IT solutions that perform dependably, without having to develop, manage, and maintain the underlying infrastructure and IT systems in-house.
To that end, both parties—your company and the solutions vendor—sign a service level agreement (SLA), which obligates the vendor to guarantee services that meet specific, agreed performance standards. Failure to comply with SLA standards penalizes vendors as it affects the end-user experience of IT services and, potentially, revenue-generating opportunities.
Let’s look at the SLA compliance ratio as a metric for service desk and ITSM functions. We’ll define the SLA compliance ratio and talk about best practices and limitations. We’ve also included resources about additional worthwhile ITSM metrics.
What is an SLA?
A service level agreement is a documented agreement that specifies how and when IT issues are resolved with respect to their priority and nature.
The IT service is responsible for analyzing IT incident metrics and complaints registered by end users to evaluate SLA compliance. If the agreed service levels fail to deliver a desired end-user experience, the SLA should be reviewed and updated with improved service commitments.
In the case when SLA compliance does not translate directly to the intended end-user experience, it still gives the service desk an opportunity to identify areas of improvement and the service’s impact on real-world, end-user performance. This information also helps ascertain which SLA commitments are real, attainable, and impactful from a business perspective.
What is the SLA compliance ratio?
An SLA compliance ratio, also known as the SLA success rate, refers to the percentage of IT incidents resolved within the agreed SLA parameters (time, cost, workflow prioritization, others as needed). The SLA compliance ratio is one metric of many that service desks can use to track IT incidents and resolution, but it is not without complications.
The calculation for SLA compliance ratio:
An accurate SLA compliance ratio would account for all parameters associated with resolving the appropriate IT incidents. This requirement assumes the true definition of success associated with each IT incident resolution as well as the inclusion of necessary metrics and parameters within the SLA agreement.
Challenges with SLA compliance ratios
The challenge with generating an insightful SLA Compliance Ratio is two-fold.
- The business organization must ensure that the pre-defined SLA metrics map exhaustively to end-user experience and IT service performance and help extensively achieve business objectives.
- The definition of successful incident resolution is not absolute. Consider that a high-priority incident resolved with only a few minutes of SLA time to spare may not be considered as successful as a low-priority incident resolved with hours of SLA time to spare. Yet, both incident resolutions may be regarded as equally successful once you calculate SLA compliance ratio.
This means that using the SLA compliance ratio does not provide a holistic overview of how well the vendor or service provider resolves the reported or discovered IT incidents. The discrepancy between IT incident resolution and customer satisfaction may remain.
Prioritizing incidents for SLA success rates
To account for these challenges, you may require your service desk to produce granular SLA success rates for resolving the diverse IT incidents that impact end-user experience. Similar SLA success rates can hold different meaning across low- and high- priority incidents:
- SLA non-compliant resolution of high-priority incidents present damaging consequences for revenue-generating IT-driven business lines.
- SLA non-compliant resolution of low-priority incidents may go unnoticed by end users, but the non-compliance still breaks your SLA.
In large-scale complex IT environments, the classification between low- and high-priority incidents may not be evident and separable. For instance, low-priority incidents resolved hours after they occur but still within the SLA time may cause a chain of events leading to a large-scale service outage. Or, small changes to interconnected and distributed datacenter networks can cause lasting service outages across the network, as seen in recent AWS outages:
- In one incident in 2017, a small typo removing more servers than intended caused a domino effect across multiple server subsystems that power significant proportion of the global Internet.
- In a 2018 incident, two brief power outages at the same AWS datacenter disrupted hundreds of critical services that rely on the AWS infrastructure in the affected region.
To remedy this, The IT service desk may implement a categorization scheme that identifies the KPI metrics and their measurement system for every IT service object or process. The underlying metrics should be collected automatically and correlated across relevant infrastructure monitoring data. Then, the service desk should evaluate which measurement units and parameters can provide the most insightful information to measure the SLA compliance of each IT service object.
Per SLA best practices, keep individual SLAs for each service. The service desk should focus on understanding how each service’s SLA compliance supports wider IT service management. Identify and document areas of ITSM improvement so you can adjust the SLA.
SLA compliance doesn’t optimize IT performance and delivery
IT personnel must remember that SLA compliance does not mean optimal IT performance and end-user experience. Typical SLA agreements are designed to guarantee the acceptable minimum performance levels. Improvements in the service levels may come at a high cost that aren’t justifiable from a business perspective. This means that your service desk is expected to achieve end-user experiences that go beyond SLA-compliant IT services.
So, how is the SLA compliance metric best used? In the context of your company’s wider ITSM strategy. Consider that the resources required for meet SLA compliance tend to depend on the performance of all your ITSM tools and processes. The SLA compliance, then, measuring compliance on high- and low-priority incidents, can indicate an impact on your company’s ITSM capabilities, especially if you need to re-evaluate your ITSM resources.
Remember, though, that the best SLAs are specific to each service. An SLA that is boilerplate or an umbrella agreement covering many services and geographic locations is rarely an effective SLA.
To develop appropriate SLAs or consider other service desk metrics for your organization, see our Enterprise IT Glossary and these BMC Blogs: