Erhan Giral – BMC Software | Blogs https://s7280.pcdn.co Mon, 08 Jan 2024 13:19:30 +0000 en-US hourly 1 https://s7280.pcdn.co/wp-content/uploads/2016/04/bmc_favicon-300x300-36x36.png Erhan Giral – BMC Software | Blogs https://s7280.pcdn.co 32 32 BMC Helix Composite Approach to Artificial Intelligence in the GenAI Era https://s7280.pcdn.co/bmc-helix-composite-approach-to-artificial-intelligence-in-the-genai-era/ Mon, 08 Jan 2024 13:19:30 +0000 https://www.bmc.com/blogs/?p=53342 Several months ago, I wrote a blog outlining BMC’s application of Generative AI (GenAI) technology through BMC HelixGPT. Since then, GenAI has demonstrated its potential for creating diverse content (text, images, audio, video), computer code, configuration, meaningful conversations, and even entire novels already developed – likely authored with just a bit of prompt engineering. Our […]]]>

Several months ago, I wrote a blog outlining BMC’s application of Generative AI (GenAI) technology through BMC HelixGPT. Since then, GenAI has demonstrated its potential for creating diverse content (text, images, audio, video), computer code, configuration, meaningful conversations, and even entire novels already developed – likely authored with just a bit of prompt engineering.

Our mission at BMC is to provide actionable insights to operations teams. Any assistive AI technology targeting service and operations teams must gain trust, and the bar to clear while assisting operations is really high. These teams are overworked and under stress most of the time. Their attention span is limited, so actions must be focused. They work in an environment where availability and performance reign supreme. ‘Actionability’ is the key KPI. Correct & plausible don’t make the cut in our efficacy benchmarks, especially in the operations management environment.

Understanding composite AI

Composite AI integrates multiple AI models to create a more comprehensive and robust set of capabilities that complement each other. The advantage of Composite AI is that it leverages the strengths of various AI components, each specialized in different domains, to create a more versatile approach with more accurate, actionable outcomes.

Think of Composite AI as an analogy to the human brain, where researchers observe similar specialization and work breakdown (cite: https://www.nature.com/articles/nature18914). While the cortex is uniform under a microscope, various imaging techniques suggest different parts of the brain specialize to handle different tasks. These regions of the brain come together to gather and process information, maintain context, make decisions, recommend actions, recall knowledge, and then communicate these recommended ‘next step’ actions to various motor subsystems. Each lobe is assigned to perform specific tasks within the human brain. The Frontal Lobe is responsible for thought, memory, and behavior. The Parietal Lobe regulates language and touch. The Temporal Lobe manages hearing, learning, and emotions. The Occipital Lobe performs visual processing. A human brain can recommend the next best actions only when all of the lobes and functions within the brain come together.

Composite AI within the context of enterprise ServiceOps, similar to the functions of the human brain, integrates and automates different types of intelligence to determine the best possible actions. However, Composite AI completes these functions on a massive enterprise scale across billions of data points in real-time by utilizing purpose-built processing pipelines for telemetry data to distill raw observations into facts that build up the context of a problem as it transpires.

With the help of Composite AI, we get to cast the monitoring products of the past as our eyes and ears and ticketing systems rich with domain and environment specific knowledge as our recallable memory.

BMC Helix composite AI approach for improved actionability

The BMC Helix Composite AI approach consists of two main parts: sensory reasoning and knowledge-based action planning. The diagram below maps these two main parts in greater detail.

What you see on the far right-hand side of the diagram is data and a lot of it! BMC Helix captures data about all observable activities constantly flowing within your organization. Observable reality manifests itself on streams of topology, events, metrics, logs, incidents, change activities, defects, and even knowledge articles someone scribbled in a forgotten SharePoint folder somewhere. These traditionally siloed data lakes are often populated with information created automatically, user-generated information, and information through third-party integrations. Helix integrates all of that data into a comprehensive model of your organization that is indexed by service topologies, as the structure and architecture of the service tends to help reasoning about all sorts of diagnostic and remedial automation functions down the line.

Sensory reasoning synthesizes and processes all of the incoming data to figureout what’s going on in reality. Metric and event data from infrastructure, applications, networks (IP, Transport, Radio Access), and end users gets interpreted to detect anomalies. Here, various BMC Helix AI models are applied to detect anomalies such as unexpected traffic/load, resource utilization/saturation as patterns. BMC Helix then applies its proprietary AI algorithms to perform sensory reasoning to further process these anomalies into qualified situational explanations that capture what went wrong, what the root cause is what the impact seems to be. These BMC Helix AI algorithms include:

  • Predictive AI applies AI techniques to predict future events or outcomes based on historical data and patterns. Components of predictive AI span machine learning (ML), training data, pre-trained models, regression, and time-series analysis. BMC Helix use case examples of predictive AI include proactive problem management, process change risk, and saturation forecasting.
  • Causal AI integrates Knowledge Graph and Transformer-based AI techniques to understand and model relationships across observability data variables. It also determines the cause-and-effect relationships between events that unfold during a problem. Components of causal AI include reasoning about causal relations or patterns using topological data and a Knowledge Graph-based causality analysis, counterfactual ‘what if’ scenario analysis, graph modeling, and variability analysis assessing how causal relationships change depending on how the variables influence one another. BMC Helix use case examples of causal AI includes root cause isolation, incident correlation, and situation explainability.
  • BMC Helix for AIOps leverages AI and ML to enhance enterprise operations by automating and optimizing tasks. BMC Helix for AIOps use cases include intelligent automation (such as for event management), root cause analysis, automated orchestration of routine tasks or workflows, automated integration with Enterprise Service Management, and third-party applications.

Through our Composite AI approach, the BMC Helix platform performs sensory reasoning across the entire IT stack: applications, containers, infrastructure, network, and even (if you have it) mainframe.

Now let’s dive into the second area of the BMC Helix Composite AI Approach, operations-informed, Knowledge-Based Actions. Here, all of the distilled observability insights about Situations from the sensory and reasoning AI algorithms are used to build context for the generative AI –specifically BMC HelixGPT. BMC HelixGPT then produces, in human-style language, the situation explanations with recommended ‘next best’ actions.

The entire BMC Helix platform, across our Composite AI approach, is based on topology aware custom low rank adaptors that allow us fine-tune models for very specific tasks and based on your determined enterprise domains. We also use retrieval augmented generation to result in more contextual, detailed responses about realtime data sources such as transaction traces, live metric data, etc. These capabilities vastly improve the accuracy of AI insights, leading to improved actionability, which is the main KPI we track as discussed in the beginning.

Applying the BMC Helix Composite AI to Operations Management

BMC Helix was built from the ground up to be a platform to process Observability and ITSM data at the telco scale. BMC Helix performs sensory reasoning based on observable reality – it provides the eyes and ears for the brain as it constantly processes vast amounts of monitoring data and formulates diagnostical reasoning as anomalies arise. BMC Helix harnesses all information flows specific to your enterprise data lakes, processing across time series and event streams. We employ Transformers and Knowledge Graph-based framework to achieve this data capture. In a future blog post, I will share a deep dive behind BMC Helix reasoning techniques involved.

We harvest and integrate monitoring data from existing tools into a unifying, comprehensive model that represents the structure and performance of targeted applications and IT services (modelled as a property-graph). BMC Helix does this dynamically without requiring any maintenance. As the architecture of the service changes with time, our AI discovers new boundaries/components, thanks to our BMC Helix for AIOps Service Blueprints.

We employ a pipeline of AI&ML modules to convert near-real-time monitoring data into aggregations about emerging and impending anomalies likely to degrade service KPIs. We collect all the available ticket data to generalize resolutions people discuss in chat streams or work logs.

To gain credibility with operational teams, we have built explainability at the foundation of Helix. Any insight we derive from monitoring and/or ticket data can be mapped back to raw data sources or sometimes more advanced reasoning and feedback components that allow the experts to review how AI reasons. Explainability also serves as a conduit to harvest domain expertise from humans. Expert feedback is our source for learning new heuristics and domain-specific knowledge, which we then generalize so that they can be applied to future problems using GenerativeAI.

HelixGPT learns domain and environment-specific knowledge about resolutions from existing ticket/issue databases. It acts like the part of our brain that learns and generalizes new concepts. We collect all the available ticket data to generalize resolutions people discuss in chat streams or work logs. We have a propriety GPT-based neural network architecture that knows to pay ’attention’ to actionable bits of these resolutions, so we can offer the operators remedial next best action even before the problem manifests at scale.

This necessitates the underlying GPT model to pay attention to vast graphs that describe the environment and the architecture of the target service, so we introduced graph-aware adapters that readily work on graph embeddings, as such vast data can’t really be expressed in natural language in context. HelixGPT learns domain and environment specific knowledge about resolutions from existing ticket/issue databases. It acts like the part of our brain that learns and generalizes new concepts. These graph-aware adapters (an industry first, patent pending) sway the network’s generation towards relational facts that matter in the environment (such as service dependencies, support-team memberships, et cetera), making us less prone to hallucination while keeping our generated insights highly actionable and specific to our users’ environment.

Together, BMC’s Composite AI approach with BMC Helix for ServiceOps, offers enterprises an integrated AI stack that sees/hears and learns/reasons about complex IT system issues – that’s how operations teams can solve problems through clear actionability.

]]>
BMC HelixGPT: An Expert of (Your) Systems https://www.bmc.com/blogs/bmc-helixgpt-expert-of-your-systems/ Fri, 17 Mar 2023 13:55:26 +0000 https://www.bmc.com/blogs/?p=52716 We have reached a historic point, where artificial intelligence (AI) is unleashing the next level of human efficiency. We haven’t achieved artificial general intelligence (AGI), nowhere near close, but we will see increasing applications of AI in diverse verticals. Enterprise operations will be one of the first domains that will see significant impact with this […]]]>

We have reached a historic point, where artificial intelligence (AI) is unleashing the next level of human efficiency. We haven’t achieved artificial general intelligence (AGI), nowhere near close, but we will see increasing applications of AI in diverse verticals. Enterprise operations will be one of the first domains that will see significant impact with this shift.

Rapidly declining hardware costs, increasing availability of operational data, and recent modelling breakthroughs are enabling a tectonic transformation for the way enterprises will run their operations. We believe the future of enterprise IT and telecommunication networks will be more data driven, proactive, efficient, and less chaotic, which in turn will enable more innovation.

BMC’s application of GPT technology

Our mission here at BMC is to enable efficient digital enterprises. We believe the next level of efficiency in enterprise operations will require near real-time processing of operational data (observational and control plane data from infrastructure, applications, and end users) such that AI can establish an unadulterated view of reality, while ticket, incident, and change data (the activities of humans touching these systems) inform it on how human beings have been responding to issues native to the environment and business.

BMC HelixGPT, the working name for emerging GPT technology, integrates all these traditionally siloed data sources together to derive actionable insights for anomalies and autonomous resolution for mundane issues.

For this purpose, we have architected an AI platform that distills operational data into actionable insights (more on this later). It’s a pipeline of AI models that automates L1/L2 IT analyst use cases (anomaly detection, root cause analysis, and mitigation) by eyeballing operational data from IT or telecom services.

We ingest and integrate operational data from all observable layers. We distill anomalous patterns from metrics and logs into events and correlate them with all sorts of other events and change data in the system, with the help of topological data sources. The result of our operational data analysis is neatly packaged insights that we call Situations. Situations are causally ordered event chains that allow us to do root cause analysis and fingerprinting of repeat issues. They also give us a corpus of event patterns, which we then use to build predictive models around brewing problems. Our data pipeline looks roughly like the following diagram.

Observational Data

Figure 1. An AIOps data path where diverse data streams about operations are run through a series of analysis steps to identify root cause and resolution of preventable issues.

In our observation, telecom networks and sufficiently complex IT services often create tribal support groups around different layers of their architectures. Telecom businesses, for instance, span several related but distinct domains. These domains are vastly different from each other as they cover operational characteristics of facilities and buildings (power, cooling, etc.), transmission networks (aggregate data networks across vast geographies), core IP/WAN (high throughput networking for software definable circuits), network services (dynamic network definitions), and end user services (services rendered/metered for a customer). Operational knowledge about these domains is often recorded in ticket resolutions but gets trapped in different systems, authored by and catering towards different subject matter experts.

We leverage these tribal resolutions by tapping into ticket worklogs. We separate redundant entries from actual resolutions and use these resolutions to train a large neural network that complements a Generative Pretrained Transformer (GPT) model, as shown in the below diagram.

HelixGPT

Figure 2. BMC HelixGPT analyzes AIOps and ITSM data to provide better insight into potential problems and automatically remediate low-risk fixes.

We are very excited about recent developments in GPT models, as they have validated our approach to solving full-cycle assurance. We have been using a GPT model to explain our insights to operators as a part of BMC HelixGPT, and we have now started to decorate our root-cause insights with ticket resolution-based prescriptions. We constrain our GPT with a domain and tenant-specific model that continuously learns resolutions from subject matter experts (SMEs) in the organization—hence the phrase “expert of (your) systems”—and prompt our GPT with root-cause analysis we conduct using observational data in prior steps in our AI architecture.

We developed this model to keep the insights provided by BMC Helix human-relatable and, yet, still specific to the enterprise so answers are based on your organization’s expertise.

 

]]>