The What and Why of AIOps
In the digital era, IT organizations that identify and understand patterns in vast and diverse data are best equipped to find, fix, and prevent performance-related problems. However, when digital transformation’s (DX) promise of hybrid and multi-cloud infrastructure creates complexity and outpaces IT performance management processes, it’s time to apply the power of artificial intelligence (AI) to IT operations (ITOps).
AIOps is the embodiment of applying machine learning (ML), analytics, and artificial intelligence to vast amounts of diverse data to automatically discover and react to potential issues in real-time. Ovum captures the challenges faced by IT operations in their On the Radar report when they state, “There is no benefit in having one aspect of the process automated if it just highlights another that is a bottleneck.” This blog will describe four simple steps to ensure IT operations not only avoids being the bottleneck of business DX initiatives, but enables its leadership role by driving digital transformation using AIOps to highlight the value of successful DX initiatives.
4 Steps to Successful AIOps
- Carefully Select Initial Use Cases
While there are many potential candidates for digital transformation (e.g., BMC Services & Consulting professionals have assisted many customers by combining the power of three of our product offerings—monitoring and event management, service management, and automation—into truly remarkable, business-defining solutions), it’s important to focus on what’s achievable and practical to maximize transformation outcomes. Given the current state of AI development, we see many customers concentrating initially on use cases core to AIOps, which are also building blocks for more advanced use cases. These initial use cases include the following:
- Application Performance Monitoring (APM) – Taking a service-aware, user-centric approach to application performance. As application developers and owners embrace DevOps and Agile to speed innovation, it’s no longer enough to monitor for availability, errors, and job completion times. Now IT will be expected to make sure data is being processed correctly, help identify problem-causing code, and diagnose end user experience issues such as slow application response times.
- Dynamic Baselining (a.k.a., Behavioral Learning) – Understanding which issues are likely to become actual problems so you know what to fix first. One of the major headaches for IT can be false events and alerts from the many monitoring tools installed across the environment. These alerts could indicate a critical problem to a customer-facing app or service. However, alerts often just clutter inboxes and cause false alarms. AIOps reduces the noise associated with the myriad events across an environment. To start, AIOps learns how the environment behaves in busy and slow times. It can then apply the knowledge of that behavior to the alerts which systems generate to determine if, in fact, an alert indicates a bigger incident with potential service impact.
- Predictive Event Management – Early warning for potential problems before they impact users. The same intelligence gathered from data collected across the environment can be applied to predictive alerting. AIOps allows technicians to know that there is an event or series of events that directly relate to a known problem in the making. In this case, AIOps will call out somewhat innocuous-looking events for more attention because those events in the past have contributed to a larger issue. This type of predictive alerting saves IT from initially hearing from end users first about a problem, and it enables the business to keep any service outages from impacting customers. Predictive information can also help IT evolve from a reactive to a proactive state by eliminating potential problems before any stakeholders outside of the group become aware of them.
- Event Driven Automation – Automating the execution of standardized triage and remediation tasks. AIOps connects and drives automation in the hyper-complex, multi-source cloud environment. Delivering machine-assisted analytics at scale on high volumes of digital IT data is useless if the outcomes still require human intervention. AIOps can generate workflows and measure the effects of those processes, feeding the results back into the system as data to be analyzed for lessons learned. AIOps responses should be automated based on available data without the need for user intervention and decision.
I suggest focusing on these achievable use cases based on many instances in which BMC Customer Success helped customers deliver considerable value, including Boston Scientific, an $8.4 billion enterprise, which has been a global medical technology leader for over 35 years.
- Organize for Success
Successful adoption of AIOps requires more than just technology—it also requires implementing new roles, processes, and data strategies. The successful adoption of AIOps requires a cultural change for most organizations as it often requires restructuring to focus on data sources as opposed to technologies involved in the implementation. As discussed in a previous blog regarding enablers of digital transformation, we see successful organizations standardize processes to simplify automation; improve governance to support new roles and effectively address organizational change management; and establish communities of practice (CoPs) that are integrated within our customers’ governance structure to combine multiple, similar technologies, people skills, and other resources into effective teams.
- Develop Core Capabilities
Leading DX by example through the implementation of AIOps, IT can develop core capabilities common to other digital transformation solutions. These may include:
- Machine Learning – AIOps enables IT to move from rule-based, human management of analysis to machine-assisted analysis and machine learning systems. This is required not only because of limits to the amount and complexity of analysis human agents can achieve, but also to enable a level of change adaptation that hasn’t previously been possible. IT analytics is ultimately about pattern matching. IT systems, users, and ecosystems exhibit behaviors and relationships that can point to root causes, isolate issues, and indicate future problems. Machine learning applies the computational power and speed of machines to the discovery and correlation of patterns in IT data. It does this faster and more effectively than humans can and dynamically changes the algorithms used based on changes in the data.
- Open Data Access – This is the most critical of all the core capabilities. IT will always have multiple technologies and systems of record from different vendors. These will also vary across IT disciplines. Freeing data from its organizational silos for big data aggregation and analysis is perhaps the most difficult challenge facing IT teams trying to implement AIOps. An effective AIOps platform must have a data schema that can consume data from a variety of IT sources, as well as structure, tag, and organize it to be useful for consistent and repeatable analysis.
- Big Data Scale and Speed – Like drinking from a fire hose, supporting the quantity, volatility, and speed of data generated by digital transformation can be overwhelming. Traditional relational data warehouses are neither scalable nor responsive enough to support this volume of digital data. Data analysis needs to take place in real-time, not only offline when resources are available. An AIOps big data platform must also support responsive ad-hoc data exploration and deep queries. Big data technologies, originally created to handle large data lakes from data warehouses, have rapidly evolved into scalable, responsive data manipulation engines that can also meet the needs of AIOps. AIOps embodies the unification of deep data research and online, real-time analytics to elevate IT decision making.
- Track Delivered Value
For IT to lead DX initiatives successfully, it must demonstrate the business value delivered in adoption of AIOps. BMC Customer Success assists clients in tracking business value with the establishment of a centralized, formal business value registry. Within the business value registry, value should be tied to overall business objectives such as reduction in mean time to repair (MTTR). Key performance indicators (KPIs) supporting those business objectives, or value cases, should align with best practices and be measurable. Finally, value must be “harvested” and governed on an ongoing basis to provide a complete, referenceable history of value delivered to the business.
Using AIOps to Drive Digital Transformation
AIOps is a transformative technology and a journey—not a destination. An initial successful implementation can assist DX initiatives and increase IT’s recognition as a true business partner. Today’s digital businesses require IT to keep pace with ever-increasing customer demand—and to do that, IT organizations must embrace technologies like AIOps. Businesses cannot deliver digital experiences on the front end without also digitally transforming the back end. AIOps will simplify management of complex distributed environments, allowing IT operations to intelligently orchestrate infrastructure, applications, and services across hybrid cloud ecosystems in alignment with the business and address customer needs on demand.
Finally, with the high cost of downtime—98% of organizations say a single hour of downtime costs more than $100K; 81% report that number to be more than $300K; and 33% say it can cost anywhere between $1M and $5M—leading the organizational focus on digital transformation with AIOps implementation can deliver serious business value.
If you would like help working through your AIOps journey, please fill out our form to connect with an expert. We would be happy to speak with you to see how we can assist.
- Video: Deploying Machine Learning and Analytics
- Blog: AI for IT Operations
- SlideShare: Steps Towards Autonomous Operations – AWS re:Invent 2018
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.
See an error or have a suggestion? Please let us know by emailing firstname.lastname@example.org.