When it comes to service management, the main determinant of quality and customer satisfaction is (usually) whether the service provider keeps their promises. Service level agreements (SLAs) are usually the go-to reference for what any company commits to its customers.
Today, with most companies giving up control of their technology and data to third party providers to gain agility and cost effectiveness, breaches to SLAs can be detrimental to their survival. An SLA breach can mean delayed product delivery or poor service performance.
A recent survey from Infrascale revealed that IT downtime:
- Creates business disruption
- Decreases employee productivity
- Results in lost customers and lost revenue
So how can service providers work with consumers to better manage breaches to SLAs? Let’s look at this issue in more detail.
What is an SLA breach?
The ISO/IEC 20000-10:2018 standard defines an SLA as a documented agreement between the organization and the customer that identifies services and their agreed performance. Here the organization acts as a supplier and the SLA will usually be part of the contract for provision of services.
The outline of an SLA might look something like this:
These are some examples of SLA metrics, defined by ITIL® 4:
- Functionality: completeness of the functions available
- Availability: percentage of availability or permitted downtime
- Performance: service throughput
- Timeliness: fulfilment of requests within deadline
- User support: timeliness of support request processing
- Accuracy: number of errors expected
- User experience: percentage of interrupted service actions
SLAs are ideal in nature, as the service provider will commit to their best effort to meet them based on factors within their control. However, when the supplier is unable to meet the promised targets—violating the agreement—you have an SLA breach.
For instance, let’s say an internet service provider’s SLA promises 99% availability in a month. That means that any outage above 7.31 hours is a breach, as long as it is within the provider’s control and not a force majeure, such as natural disasters or faults precipitated by third parties (e.g. road construction causing damaged utilities or delayed deliveries).
Has an SLA breach occurred?
At what point do we determine that an SLA breach has occurred?
First, you have to look at the fine print of contracts. Some providers will only accept an SLA breach if it is reported to them and formally logged on their service management system so that the metrics can be measured on a common platform.
Additionally, the burden of proof is usually left to the customer to show that there was actual breach of SLA. For example, a customer may need to use their own monitoring system to demonstrate network latency was below the promised target.
Reacting to SLA breaches
When you think an SLA breach has occurred, here’s what to do.
For the customer, the first action is to log the issue through the channels that the service provider has availed—email, social media, CRM, or a service management system. This may take place immediately, as soon as you noticed the issue, or even a few days later depending on the contract terms.
- If immediate, at that point the clock starts ticking for the service provider to respond and address the issue within the targets before the SLA is breached.
- If it is reported post-event, it is still a valid record of the issue and SLA breach, so long as the report has been made within the duration specified in the agreement.
Once the SLA has been breached, the onus is on the provider to communicate openly with the customer on the status and agree on the path forward.
Stopping the clock
One controversial tactic used by service providers in responding to SLA breaches is ‘stopping the clock’. This is a functionality provided in service management tools that allow agents to suspend the SLA, especially where the metric is based on timeliness. For instance, you might suspend the SLA when:
- Waiting for feedback from the customer
- Escalating to a third-party supplier
- The cause of the breach is outside the service provider’s control
This approach might seem right in the eyes of the service provider. Others, especially the customer, might perceive it to be cheating as it doesn’t reflect the reality of the breach in the measurement of the SLA. This has resulted in a phenomenon called “watermelon SLAs” where the provider reports all SLAs met (green on the outside), while the customer’s experience is poor (red on the inside).
Some providers take SLA breaches into consideration when crafting their contracts. A default remedy many providers turn to is service credits, providing a discount in the next billing cycle to cover for the breach.
For example, an ISP might provide a pro-rated service credit equivalent to the length of downtime or based on a percentage of the bill, to be applied in the next month’s bill.
This is usually the most frictionless approach that favors the service provider, as the terms of service credits are usually capped to a percentage of the total invoice amount.
However, no customer is happy to receive a monthly discount for bad service, particularly as the value of the service is much more that what the customer pays for. It is obvious that lost sales for a business when their e-commerce site is down is significantly more than what they pay for hosting of the site.
SLA breaches require processes
Handling an SLA breach isn’t something that should be left to a service desk agent or automated system. A process needs to be established that escalates the breach to supervisors or management for review and redress.
It is important that the service provider engages the customer formally where prolonged or extensive breaches have occurred, with a plan for remedial and preventive actions. Without proper communication and follow up, chances of customer churn or lawsuits are bound to increase depending on the impact of the breach to the customer.
Avoiding SLA breaches
While no service provider will ever commit to perfect achievement of targets, you certainly can take steps to avoid SLA breaches or limit their occurrence.
Designing for SLAs
One way is to factor in SLAs during design of the service rather than bolting them on after go live.
A word of caution here. It might look ambitious to match a competitor’s SLA. But, without the right underlying architecture and resource levels, you might end up looking even worse because your service was never designed to meet those targets.
Strong problem management & continual improvement
By investigating root causes and coming up with workarounds and improvements, any issue that causes repeat SLA breaches can be addressed, whether it involves:
- Technology upgrade or redesign
- Employee training
- Process reengineering
Carrying out regular review of SLAs with customers and suppliers involved in the value stream ensures they are updated to remain realistic and relevant with the changing needs and environment.
A framework for monitoring and advance alerting particularly with thresholds that warn the service provider in advance can help limit breaches, only if the service provider can respond appropriately. In this scenario, automating workflows to notify or route potential breaches to more skilled teams can go a long way.
Planning for SLA breaches
It isn’t possible to avoid every single SLA breach—some are bound to happen. But by preparing for them in the first place, you can limit the damage they might cause.