According to Gartner, IT Operations personnel (IT Ops) are in the midst of a revolution. The forces of digital business transformation are necessitating a change to traditional IT management techniques. Consequently, we are seeing a significant change in current IT Ops procedures and a restructuring in how we manage our IT ecosystems. And Gartner’s term that captures the spirit of these changes is Artificial Intelligence for IT Operations, or AIOps.
AIOps as a market category has exploded over the last couple of years. The number of inquiries Gartner fields has increased exponentially as have the number of Google searches on the topic. This post explains the technology and market dynamics driving the emergence of AIOps and how it is a response to those challenges.
Digital Transformation and the Road to AIOps
It’s important to understand how digital transformation gives rise to AIOps. Digital transformation encompasses cloud adoption, rapid change, and the implementation of new technologies. It also requires a shift in focus to applications and developers, an increased pace of innovation and deployment, and the acquisition of new digital users–machine agents, Internet of Things (IOT) devices, Application Program Interfaces (APIs), etc.–that organizations didn’t need to service in the past. All these new technologies and users are straining traditional performance and service management strategies and tools to the breaking point.
Artificial Intelligence for IT Operations describes the paradigm shift required to handle digital transformation in IT Operations.
AIOps refers to multi-layered technology platforms that automate and enhance IT operations by 1) using analytics and machine learning to analyze big data collected from various IT operations tools and devices, in order to 2) automatically spot and react to issues in real time.
Gartner explains how an AIOps platform works by using the diagram in figure 1. AIOps has two main components: Big Data and Machine Learning. It requires a move away from siloed IT data in order to aggregate observational data (such as that found in monitoring systems and job logs) alongside engagement data (usually found in ticket, incident, and event recording) inside a Big Data platform. AIOps then implements Analytics and Machine Learning (ML) against the combined IT data. The desired outcome is continuous insights that can yield continuous improvements with the implementation of automation. AIOps can be thought of as Continuous Integration and Deployment (CI/CD) for core IT functions.
Figure 1: Gartner’s visualization of the AIOPS platform
AIOps bridges three different IT disciplines—service management, performance management, and automation—to accomplish its goals of continuous insights and improvements. AIOps is the recognition that in our new accelerated, hyper-scaled IT environments, there must be a new approach that leverages advances in big data and machine learning to overcome legacy tool and human limitations.
What’s Driving AIOps?
The promise of artificial Intelligence has been to do what humans do but do it better, faster, and at scale. AIOps will do this for IT Operations by addressing the speed, scale, and complexity challenges of digital transformation, including:
- The difficulty IT Operations has in manually managing its infrastructure. It’s becoming a misnomer to use the term “infrastructure” here, as modern IT environments include managed cloud, unmanaged cloud, third party services, SaaS integrations, mobile, and more. Traditional approaches to managing complexity don’t work in dynamic, elastic environments. Tracking and managing this complexity through manual, human oversight is no longer possible. Current IT Ops technology is already beyond the scope of manual management and it will only get worse in the coming years.
- The amount of data that IT Ops needs to retain is exponentially increasing. Performance monitoring is generating exponentially larger numbers of events and alerts. Service ticket volumes experience step function increases with the introduction of IOT devices, APIs, mobile applications, and digital or machine users. Again, it is simply becoming too complex for manual reporting and analysis.
- Infrastructure problems must be responded to at ever-increasing speeds. As organizations digitize their business, IT becomes the business. The ‘consumerization’ of technology has changed user expectations for all industries. Reactions to IT events–whether real or perceived–need to occur immediately, particularly when an issue impacts user experience.
- More computing power is moving to the edges of the network. The ease with which cloud infrastructure and third-party services can be adopted has empowered line of business (LOB) functions to build their own IT solutions and applications. Control and budget have shifted from the core of IT to the edge. More computing power (that can be taken advantage of) is being added from outside core IT.
- Developers have more power and influence but accountability still sits with core IT. In DevOps organizations, programmers take more monitoring responsibility at the application level, but accountability for the overall health of the IT ecosystem and the interaction between applications, services, and infrastructure still remains the province of core IT. IT Ops is taking on more responsibility just as digital businesses are getting more complex.
The Elements of AIOps
AIOps consist of the following elements, shown in figure 2:
Figure 2: The technologies that make up an AIOps platform
- Extensive and diverse IT data sources, from currently siloed tools and IT disciplines such as events, metrics, logs, job data, tickets, monitoring, etc.
- A modern big data platform that permits real-time processing of streaming IT data. Examples include Hadoop 2.0, Elastic Stack, and some Apache technologies.
- Rule application and pattern recognition that enforce leverage and/or discover context while uncovering regularities and normalities in the data. These can be, but don’t have to be, specific to the domain.
- Domain algorithms that leverage IT domain expertise (specific to one environment or at the industry level) to intelligently interpret and apply the rules and patterns, as dictated by an organization’s data and its desired outcomes. These algorithms make it possible to achieve IT specific goals like eliminating noise, correlating unstructured data, establishing baselines, alerting on abnormalities, and identifying probable cause.
- Machine learning that can automatically alter or create new algorithms based on the output of algorithmic analysis and new data introduced into the system.
- Artificial intelligence that can adapt to the new and unknown in an environment.
- Automation, which uses the outcomes generated by the machine learning and/or AI to automatically create and apply a response or improvement for identified issues and situations.
It needs to be said that although AIOps represents a radical departure for IT Ops, it’s not a radical application of machine learning and big data. A similar ML approach was implemented when stock brokers moved from manual trading to machine trading. Analytics and ML are used in social media, in applications like Google Maps, Waze, and Yelp, as well as in online marketplaces like Amazon and eBay. These techniques are used reliably and extensively in environments where real-time responses to dynamically changing conditions and user customization are required.
Adoption of artificial intelligence in AIOps is nascent compared to machine learning. Right now, the pressing use cases are best addressed with simple automation or a combination of ML and automation. It remains to be seen how AI will evolve and what new use cases it will enable. In any case, a strong AIOps foundation needs to be laid on IT Operations as it exists now before we can begin modeling human behavior for use on it.
IT Ops personnel have been slow to adapt to AIOps-like environments because, out of necessity, our jobs have always been more conservative. It’s IT Ops’ job to make sure the lights stay on and to provide stability for the infrastructure that organizational applications ride on. However, due to the trends listed above, more IT Ops shops (especially those in the Enterprise) will need to implement AIOps strategies and technologies in the near future.