If you’ve been in IT for any length of time, you’ve seen a lot of hype around trends, many of which came and went without any material impact. Is AIOps being over-hyped? While the market is in its early days, that doesn’t mean there’s nothing behind the buzz. In fact, the reality is that organizations are seeing significant benefits from AIOps today.
This is the first in a series of posts I’ll be publishing on AIOps use cases. In this post, I’ll look at the challenges driving the need for AIOps, some analysts’ take on the evolving segment and explore key use cases that offer opportunities for harnessing the benefits of AIOps in the near term.
Drowning in Data, IT Teams Can’t Keep Pace
Today, virtually every aspect of business success is contingent upon the optimized performance and continued innovation of IT-powered services. At the same time, the IT landscape continues to see fast-paced, fundamental change.
IT environments are increasingly hybrid, complex, and fast-moving. For example, the proliferation of DevOps approaches has introduced a massive acceleration in application release cycles. With the move to containers, the number of monitoring metrics that need to be tracked increases exponentially. The continued adoption of dynamic cloud services serves to accelerate the rate of change, and fuel further explosive growth in data. According to a Gartner report, IT infrastructure and applications generate two to three times more data volumes every year.
For ITOps teams, the volume, variety, and velocity of operations data have surpassed human scale. It seems that the balancing act of optimizing service levels while enabling innovation continues to grow both more critical and more difficult to achieve.
The tool sprawl that’s been occurring in most IT teams has only compounded matters. On average, ITOps teams are using 11 different monitoring tools, which adds to the data management headaches. Given these trends, IT teams are facing several pressing problems:
- Event noise drowns out real issues, reducing efficiency and driving up MTTR
- Issues go undetected—until users and customers encounter problems
- It takes too long to resolve issues, putting SLA compliance at risk
- Struggling to keep pace, IT teams are ill equipped to support innovation
Promise of AIOps
To address the pressing and proliferating challenges outlined above, many organizations are looking to adopt artificial intelligence for IT operations, or AIOps. At a high level, AIOps equips ITOps with a combination of machine learning, analytics, and automation to realize enhanced efficiencies, cost savings, and speed across their organizations. With AIOps, these teams can find and fix issues faster, and even gain the predictive insights they need to prevent issues from occurring in the first place.
Given the enormous potential of AIOps, the topic continues to gain increasing coverage by media and analysts. For example, in a recent report, IDC analysts predicted that, by 2021, 70% of CIOs will aggressively apply AIOps to cut costs, improve IT agility, and accelerate innovation. As mentioned in my prior blog, Gartner has released an updated version of its Market Guide for AIOps. This report describes a market that is in its early stages, but the authors also detail important developments under way that are fueling significant near-term dividends.
AIOps is a segment poised to see significant innovation in the future, but that doesn’t mean ITOps teams should hold off on moving forward. In fact, given the urgency of the challenges being faced, many teams really can’t afford to wait.
Organizations will be well served by taking a phased approach. By starting with focused use cases, teams can begin to start seeing significant benefits, and position themselves to maximize the potential of AIOps in the long term.
The reality is that leading ITOps teams have started to deploy AIOps capabilities, and they’re seeing significant benefits already. Following are examples of some of the key use cases these teams are pursuing:
- Event noise reduction. In today’s increasingly complex, dynamic, and interrelated environments, far too many teams are being overwhelmed by massive volumes of events. This leads to drudgery, inefficiency, and excessive risk of critical alerts being missed. With an AIOps approach, teams can apply machine learning to historical and real-time data to identify patterns and suppress events that fall within bands of normalcy. This enables massive reductions in event noise, while better ensuring the most critical alarms are addressed most quickly and effectively. Ensono, a TrueSight customer that was trying to manage 10K plus events per month has been able to apply machine learning to drive down the number of events to only a few hundred per month.
- Predictive alerting. Many IT operations teams are having a tough time getting out of firefighting mode. Too often, they find out about issues after users do, and are forced to scramble to address problems after the fact. This means service levels and staff productivity continue to suffer. AIOps offers the ability to apply advanced analytics to historical and real-time performance metrics, and to establish dynamic baselines that help identify anomalies and generate predictive alerts. With these capabilities, teams can start to remediate issues—before services are affected. Park Place Technologies is moving from a reactive to proactive service model with predictive monitoring that alerts on issues before customer impact to drive service excellence and reduced costs.
- Probable cause identification. Within IT operations teams, operators are wrestling with tools that provide isolated, limited visibility, which makes diagnosis and resolution efforts slow and time consuming. With AIOps platforms, IT operations teams can establish root cause analysis capabilities that are powered by advanced correlation and log and event analytics. With these capabilities, staff members can correlate millions of monitoring data points, including metrics, events, logs anomalies, and baselines, to automatically and quickly identify the most likely sources of issues. After leveraging these capabilities, teams have been able to dramatically speed diagnosis, which has fueled significant improvements in service levels and operational efficiency. The Brazil Ministry of Education leverages TrueSight event correlation and log analytics capabilities to speed problem analysis for major infrastructure events and has seen a drop in time taken to identify root cause from between 8 and 12 hours to four hours max.
- Automated remediation, incident, and change management. In today’s fast-changing environments, highly manual, time consuming, and error-prone tasks represent an increasing liability. The real value of an AIOps strategy comes in being able to take automated action on the rich insights that are delivered by machine learning and analytics. With automated remediation workflows and integration with the service desk for incident and change management, IT operations teams can significantly reduce mean time to resolution and fully leverage the value of advanced analytics. Further, they’re able to offload a lot of repetitive administrative tasks from skilled IT resources, enabling those staff members to focus on more high-value efforts. When Transamerica implemented automated event remediation with links to the Service Desk for ticketing and change management, they saved more than 9,000 hours of staff time in the first seven months.
To balance the need to support business innovation with the challenge of increased data volumes and complexity, IT operations teams are increasingly turning to AIOps technologies and approaches. The potential of AIOps is enormous, and the time to move is now. In subsequent blog posts in this series, we’ll provide a more detailed look at each of the above use cases, including the challenges addressed, the requirements, and the benefits realized. Be sure to keep an eye out for our next post, which will be on event noise reduction.
In the meantime, to learn more about our AIOps offerings, be sure to visit the TrueSight AIOps page.