Ian Russ – BMC Software | Blogs https://s7280.pcdn.co Wed, 01 Nov 2023 12:08:30 +0000 en-US hourly 1 https://s7280.pcdn.co/wp-content/uploads/2016/04/bmc_favicon-300x300-36x36.png Ian Russ – BMC Software | Blogs https://s7280.pcdn.co 32 32 Zero Touch, Zero Trouble Starts with AIOps-Enabled Service Assurance https://s7280.pcdn.co/zero-touch-zero-trouble-starts-with-aiops-enabled-service-assurance/ Wed, 01 Nov 2023 11:46:06 +0000 https://www.bmc.com/blogs/?p=53255 Innovations like virtualization, converged network services, and the telco cloud offer exciting possibilities for communication service providers (CSPs) and their customers—provided they can solve the accompanying operational challenges. The evolution to software-defined everything can let operators activate services in minutes, not days; scale resources more flexibly to improve service while optimizing cost; and push workloads […]]]>

Innovations like virtualization, converged network services, and the telco cloud offer exciting possibilities for communication service providers (CSPs) and their customers—provided they can solve the accompanying operational challenges. The evolution to software-defined everything can let operators activate services in minutes, not days; scale resources more flexibly to improve service while optimizing cost; and push workloads to the network edge to support new use cases in a more mobile and connected world—the list goes on.

But to realize this vision for the future of their industry, CSPs will need to modernize and transform service assurance in tandem with their environment. Traditional silo-based practices and technologies simply won’t be able to meet expectations for greater agility, prediction, and automation. As IT and network technologies converge and hybrid and public clouds reshape the infrastructure, CSPs will need to take a new approach to service assurance—one that uses a common, artificial intelligence for IT operations (AIOps)-powered platform. This will improve reliability, accelerate mean time to repair (MTTR), improve agility, and enable the shift to zero touch and zero trouble operations across the converged environment.

Virtualization leaves traditional service assurance behind

One of the main reasons network operators are adopting virtualization is to enable greater speed and agility. Customers are annoyed when a new service takes three days to be activated, and developers and operations teams want to be able to spin up new services that span multiple technology domains as quickly as possible. CSPs also want the flexibility to move workloads from the data center to the edge to support low-latency use cases like autonomous vehicles and virtual reality. In a software-defined world, operators have the freedom to reinvent their business at digital speed.

But service assurance is already proving to be a critical brake on this transformation. Designed for the massive, static, hardware-based, and slow-moving networks of the past, traditional approaches can’t keep pace with the dynamic and converged nature of modern environments. Siloed, duplicative, and overlapping assurance technologies for IT and network infrastructure make it more difficult to monitor services, fix faults, and manage resources for functions with dependencies in both domains, such as virtualized network functions delivered over hybrid cloud.

In the old days, when an issue affected the network, the network operations team could usually infer the cause by looking at a relatively small set of logs and monitors. In a converged environment, those investigations can span both network and IT technologies as well as a Google, Azure, or Amazon Web Services (AWS) Cloud, making root cause analysis a much more challenging prospect. Meanwhile, the use of shared cloud resources introduces new types of issues that traditional network monitoring tools can’t easily pick up, like a “noisy neighbor” virtual machine (VM) or a container starving other functions of resources. Correlating issues across silos and determining root causes becomes an exercise in frustration, while manual, disconnected processes increase MTTR and cost.

The threat to service quality is exacerbated by the reactive nature of traditional service assurance solutions. Aside from routine preventative maintenance, most operational behavior has consisted of waiting for something to break before acting—an approach that makes it impossible to maintain the reliability and availability customers now expect. When you can’t stop problems from affecting service, and it takes you longer to resolve them, customers end up with poor voice quality, jittery video, or stalled downloads that are more frequent and last longer. That’s a critical business problem for CSPs in hotly competitive markets where switching incentives are common and customer loyalty is fleeting.

To keep their converged infrastructure healthy and their services running at their best—and keep their customers, CSPs need to unify and automate service assurance—and ultimately drive to zero touch, zero trouble.

Building AIOps into service assurance

Slow, siloed, and largely manual processes make it far too difficult for operations teams to manage their environments and solve problems, much less work proactively to prevent problems and plan for future needs. What they need now is a way to achieve unified observability across both hybrid cloud and network infrastructures, quickly correlate this data, interpret its meaning, and act quickly to assure service quality.

With a unified, cloud-native AIOps platform, IT operations (ITOps), and network operations (NetOps) teams can leverage built-in intelligence to automatically identify the underlying conditions contributing to a disruption. Noise suppression helps teams work more efficiently by removing distractions and false alarms. As generative AI technologies like ChatGPT reshape the way people interact with systems, AIOps can translate complex root causes into natural language summaries and next-step suggestions. By correlating data across multiple network and technology domains, these technologies can understand the actual customer impact and provide timely and accurate notification—a key element of a satisfying customer experience.

Shifting from reactive to proactive, AIOps can help teams predict future issues and see packet loss earlier to improve network reliability. To enable automated remediation and self-healing, the platform can prompt a network orchestrator to take steps such as restarting a given device or changing a parameter on the configuration setting to resolve an issue before it impacts service level agreements (SLAs). A self-learning AIOps platform helps ITOps and NetOps teams improve agility by automating the configuration of monitoring and management rules for cloud-native and dynamic infrastructure services and applications. By analyzing trends, forecasting scenarios, and simulating demand, CSPs can plan accurately for the capacity needed to support new products effectively at a high level of quality.

Completing the vision for the modern CSP

While AIOps can help CSPs evolve toward a zero touch, zero trouble model, there will always be situations where human intervention is needed. In the previous blogs in this series, we talked about the requirements for a unified network service management platform to streamline that resolution flow, as well as the unified discovery needed to provide complete data and visibility across converged infrastructure. Together, these three capabilities form the foundation for a new era of autonomous networks delivered through dynamic, multi-domain, hybrid cloud environments.

To learn more, read the first two blogs in this series, Modern CSPs Need Unified Visibility Across Hybrid Cloud and Demanding Markets Drive CSPs to Transform Network Service Management.

Then visit https://www.bmc.com/blogs/bmc-helix-receives-catalyst-showcase-award/ to find out about how BMC recently won a TM Forum catalyst award.

]]>
Telecom: Transforming service assurance with AIOps for the era of network automation https://www.bmc.com/blogs/transforming-service-assurance-with-aiops/ Mon, 05 Apr 2021 00:00:50 +0000 https://www.bmc.com/blogs/?p=49257 In competitive and fast-changing telecommunications markets, service quality is a critical differentiator. Customers have demanding expectations for their experience, and they won’t accept disappointing performance, unreliable availability, or a sluggish response to problems. Meeting their high standards was challenging enough in earlier eras. Now, as autonomous technologies increase the speed of network operations across software-driven […]]]>

In competitive and fast-changing telecommunications markets, service quality is a critical differentiator. Customers have demanding expectations for their experience, and they won’t accept disappointing performance, unreliable availability, or a sluggish response to problems. Meeting their high standards was challenging enough in earlier eras. Now, as autonomous technologies increase the speed of network operations across software-driven environments, operators need to make sure that the service management layer becomes faster and more automated as well. In my earlier blogs, I discussed evolving approaches to Service Assurance and the growing role of artificial intelligence (AI), analytics, and automation in technology operations. Now, I’ll drill down on the use of AIOps to transform Service Assurance for the era of network automation and support key use cases such as fault and problem prediction, zero-touch network operations, change, and dynamic inventory.

What network automation means for Service Assurance

The drive toward autonomous networking is advancing at full speed, as initiatives such as the Open Network Automation Platform (ONAP) seek to enable real-time, policy-driven orchestration and automation of physical and virtual network functions. While enabling operators to respond more quickly to customer requests, and to optimize operations across their increasingly software-driven network environment, this increase in automation will also place new importance on service assurance.

Network automation can be thought of in terms of in-band and out-of-band use cases. For in-band use cases (including many performance or capacity issues), things follow a relatively predictable set of patterns, enabling end-to-end automation. Here, Service Assurance doesn’t need to do much more than record the processes performed. For out-of-band use cases, (physical infrastructure failures or rare alarm conditions) however, it can be either less clear what should happen next or require boot-on-the-ground interaction. As exceptions arise, an operator may need to get involved to make decisions. At that point, we need to ensure a level of governance across operational processes such as change—though without reverting to a fully manual approach.

AIOps offers a solution. AI, big data analytics, and machine learning make it possible to augment, guide, and increasingly replace human decision processes so that operators can ensure service quality more efficiently at scale to meet customer expectations.

Applying AIOps to key use cases

Service Assurance offers a variety of suitable use cases for AIOps, with reasonable large data sets to which AI and machine learning can be applied to cluster related faults, identify underlying network problems, prioritize resolution, and so on. High-value use cases include the following.

Improve prediction of faults and problems – A classic AIOps use case is to shift from proactive to preventive remediation by predicting problems before they’ve arisen. In a network automation context, as operators seek to increase the level of automation in an area that can’t be fully automated, AIOps makes it possible to support this more predictive approach by automatically detecting anomalies based on established, dynamic baselines. Once a potential problem has been identified, machine learning enables fault clustering of related problems with the same root cause to speed troubleshooting and prioritize resolution. In some cases, it may even be possible to fix the automatically without human intervention.

Evolve toward zero-touch network operations – While the full zero-touch network operations center (NOC) remains for now an aspirational goal, with the network automatically assessing events and then making and acting on its own decisions, AIOps can already help operators achieve a higher level of automation. At the current level of solution maturity, machine learning can be used to guide operator actions based on what’s been done in the past, how well it worked, and its chance of succession the current case. In situations where an engineer needs to go on-site to make a repair, the system can identify the right person to send to the right location with the right equipment to fix the problem effectively. By spending less time investigating before initiating a repair, the operator can reduce MTTR and touchpoints, deal with more problems concurrently, and improve their fix rate. If a given problem is found to be non-critical, the NOC can choose to wait to send an engineer until a more efficient time, such as a trip when multiple repairs can be combined on a single visit.

Manage change within an autonomous network

As network automation advances, a key step will be removing the human element in the way changes are handled. Having machines take over that function calls for a new way of thinking about change management, including the assessment, planning, approval, scheduling, and execution of changes. With AIOps, the assessment of the impact of a change can be automated, including not just its effect on SLAs, but also how its execution can be scheduled to minimize disruption to the customer—and whether the customer needs to be informed at all. Today, operators generally err on the side of caution to tell customers about situations with even a relatively small risk of disruption. If AIOps allows us to reduce that risk below a certain threshold, this notification can become unnecessary, sparing customers the need to focus on contingencies with minimal likelihood of occurring.

Support Service Assurance and dynamic inventory

What does an increasingly dynamic world mean for service assurance? In the past, change was relatively infrequent, with predictable impacts. As virtualization increases, things move around more quickly, and fixes are made automatically, operators may not have such a clear understanding of the way things are connected and affected by each other. AIOps is now necessary to keep up with this more ephemeral environment. That’s even more true as customer workloads move to the network edge, leading to a tighter relationship between the two and expanding the scope of service assurance accordingly. Changes to the network can now directly affect the performance of edge workloads, and in more significant ways—for example, an autonomous vehicle workload for which slower performance can be a matter not just of convenience, but of safety.

Simply put, in a world where everything is in motion, it may no longer be possible or even necessary to have a complete view of the entire environment at a given moment. Instead, we should focus on understanding specific impacts well enough to support faster decision-making—and AIOps will make that possible.

]]>
Telecom: Enabling Automaton Everywhere through AI and Analytics https://www.bmc.com/blogs/telecom-enabling-automaton-everywhere-through-ai-and-analytics/ Wed, 24 Feb 2021 07:49:04 +0000 https://www.bmc.com/blogs/?p=20261 As operators drive to increase the velocity and scale of Service Assurance processes across their complex, converging enterprise IT and network domains, automation has become a critical priority. In my last blog, I discussed the need for new approaches to key Service Assurance processes. In this blog, I’ll delve more deeply into how operators can […]]]>

As operators drive to increase the velocity and scale of Service Assurance processes across their complex, converging enterprise IT and network domains, automation has become a critical priority. In my last blog, I discussed the need for new approaches to key Service Assurance processes. In this blog, I’ll delve more deeply into how operators can leverage automation to meet key challenges, including:

  • Bringing context and intelligence to trouble ticketing
  • Reducing the cost and customer impact of change
  • Operator-guided automation

Bringing context and intelligence to trouble ticketing

Converging trends are bringing trouble ticketing to the top of the agenda for telecommunications transformation. Rapidly changing technologies and rising complexity are increasing the potential for problems to emerge throughout the network, often with causes and impacts that are less well understood. At the same time, intense competition makes customer satisfaction critical for success and growth. Meanwhile, network staff must find faster, more efficient ways to resolve problems to avoid becoming overwhelmed.

To solve these challenges, operators need to be able to use data more effectively to provide the context that drives intelligence. Automated, AI-powered analysis of complex datasets, presented through graphic interfaces, can help operations teams identify patterns in performance, alarm, and ticket data across network, infrastructure, and cloud environments. Armed with this insight, network staff can intelligently correlate multiple problems into a common root cause or set of customer impacts in order to create, assess, assign, and remediate trouble tickets more quickly and efficiently.

Reducing the cost and customer impact of change

Whether human-initiated or machine-generated, the rate and complexity of change continues to increase. Traditional approaches to change can’t accommodate this at the speed and scale necessary to meet ongoing network transformation, changing customer requirements, and performance management. Operators need to be able to improve the success rate of change, better understand its impacts on cost and quality, understand complex change interactions, and optimize scheduling to minimize customer impact.

By automating change assessment and scheduling for agent-initiated change, operators can reduce human involvement while reducing cost and increasing success. As changes are planned, potential customer disruption can be predicted and mitigated more effectively, and the change itself can be automated. AI and automation can also support fully automated change assessment and scheduling for machine-generated changes to eliminate human interaction entirely.

Operator-guided automation

Operator-guided automation is an intermediate but often critical stage of Service Assurance on the journey to fully autonomous networks and self-governing systems. Before operators turn over control entirely to AI and closed-loop automation, they can first choose to take a semi-automated approach. Following an assessment in which the system determines what level of human interaction is needed, AI-driven intelligent automation can guide operators to the right trouble remediation actions or enable fully automated remediation. In either case, the success of actions is measured to improve accuracy. As manual intervention is reduced and eliminated, the operator can process toward fully closed-loop automation.

Evolving to AIOps

To deliver a competitive customer experience, operators must enable a shift of Service Assurance from fixing problems that have already emerged, to using system data to enable a predictive approach in which problems are averted before they compromise service quality, and AI makes it possible to assess common underlying problems across systems. More broadly, operations teams can use big data, machine learning, and analytics to identify patterns in monitoring, capacity, and automation data across complex technology infrastructure. Based on this insight, they can work more quickly and effectively to improve the speed, quality, and cost-efficiency of service delivery. Ultimately, the goal is to enable fully autonomous delivery in which real-time automation ensures optimal service quality with the speed, scale, and efficiency required for modern operator environments.

In my next blog, I’ll discuss using AIOps to deliver an improve customer experience, including improving operator visibility across all customer touchpoints, automating problem resolution, and improving service quality management.

]]>
Evolving Service Assurance for Modern Telecommunications https://www.bmc.com/blogs/service-assurance-for-telecommunications/ Thu, 28 Jan 2021 08:07:48 +0000 https://www.bmc.com/blogs/?p=20077 To meet modern customer expectations and keep pace with rapid innovation, telecommunications operators need new ways to ensure service quality at the velocity and scale of modern 5G services. As  technology infrastructures and operations evolve, traditional approaches to Service Assurance are quickly becoming outdated. A new generation of platforms will have to accommodate increasing interdependency […]]]>

To meet modern customer expectations and keep pace with rapid innovation, telecommunications operators need new ways to ensure service quality at the velocity and scale of modern 5G services. As  technology infrastructures and operations evolve, traditional approaches to Service Assurance are quickly becoming outdated. A new generation of platforms will have to accommodate increasing interdependency across business lines and domains; more complex technology ecosystems; growing data volumes; and the drive toward autonomous operations and converged networking.

In this blog, I’ll talk about why a new approach to Service Assurance is needed, and what it will have to look like.

The growing gap between Service Assurance requirements and capabilities

Modern service assurance is supported through four key functions:

  • Trouble ticketing – The reception, assessment, correlation, and resolution of detected problems within the network, supporting both reactive and proactive processes to ensure reliable service delivery for customers.
  • Change management – Mitigating and eliminating the risk of disruption to customer services during planned modifications to the network.
  • Resource and service inventory – Maintaining accurate visibility of network topologies and other contextual information to support decision-making within a Service Assurance context.
  • Service level management – Informing and guiding proactive and automated activity through an improved understanding of expected service levels and current performance.

Performing these functions effectively depends on comprehensive visibility and understanding across the network environment; timely and accurate decision-making; and optimal process efficiency. These requirements have become increasingly challenging to meet as evolving needs push traditional service assurance approaches to the breaking point. To date, operators have often run multiple, separate Service Assurance platforms for different business lines and domains—fixed, mobile, carrier, and so on. However, modern environments are characterized by increasing interdependency across domains; for example, mobile issues might be caused by issues in a fixed line, such as a disruption of mobile backhaul. In this light, separate ticketing systems pose barriers to insight, limit opportunities for automation, and ultimately compromise service quality.

Increasing dependencies spanning enterprise IT and network domains will call for closer alignment of Service Assurance processes as well. 5G will increasingly blur the line between the two, as enterprise workloads run on Multi-Access Edge Compute (MEC) at the network edge. Traditional “network-only” Service Assurance will struggle to accommodate these use cases effectively.

Unlike the relatively monolithic environments for which legacy Service Assurance platforms and operations were designed, modern telecommunications businesses run on a complex and dynamic ecosystem encompassing multiple in-house, supplier, and partner solutions and technologies. Customization-led approaches can no longer deliver the required levels of efficiency, while traditional integration methods are costly, slow, and fragile, and do not scale to meet the needs of today’s operators. The increasing volumes associated with 5G will add to the stress, further invalidating traditional operations. With more data and more devices to accommodate, manual or strictly deterministic approaches to Service Assurance will not scale effectively.

Maturing Service Assurance models and technologies

The evolution of Service Assurance should be seen in the context of broader trends in telecommunications technology. In a rapidly transforming industry, operators need to be able to make an agile response to changing customer needs and emerging business opportunities. This has driven a trend toward more proactive, predictive, and ultimately autonomous operations, where decision-making can be entirely automated and human interaction for mundane tasks can be eliminated. Service Assurance is an important element of this strategy, as operators seek to support greater automation across the lifecycle, including the creation, assessment, assignment, notification, and remediation of trouble tickets with minimal human involvement.

As operators seek to accelerate operations to the velocity and scale needed to deliver modern 5G services, artificial intelligence and machine learning will play a key role in enabling this higher level of automation for use cases including:

  • Service management to ensure quality of service through optimized network and IT services
  • Customer experience management focused on the needs and experience of individual customers and users
  • Autonomous operations where AI/ML-powered insights enable closed-loop automation

Beyond automation, meeting customer objectives will require operators to increase their focus on interoperability. By promoting easier integration and operation across diverse and complex ecosystems, operators will be able to mature beyond KPI-driven platform management to a more fluid and unified approach to management across services. Delivering the right data to the right people and processes at the right time will be important as well, providing enriched resource and service context to enable greater levels of insight, actionability, and automation.

To ease management complexity, the new generation of Service Assurance platforms will increasingly be “headless,” integrating into a common user experience portal for network teams to reduce the number of systems that operator personnel need to interact with. A single platform across both IT and network operations will support greater convergence to improve visibility, quality, and efficiency. Support for open standards and digital architectures will improve interoperability and ease integration across technology ecosystems. Support for cloud-based OSS delivery will enable operators to take advantage of cloud elasticity and scale.

In my next blog, I’ll talk about using AI-powered operations, or AIOps, to support Service Assurance and deliver closed-loop automation.

]]>
Why 5G makes automation essential for telecom operators https://www.bmc.com/blogs/5g-makes-automation-essential-for-telecom-operators/ Thu, 07 Jan 2021 07:41:46 +0000 https://www.bmc.com/blogs/?p=19825 The evolution to 5G will open a world of new opportunities for telecommunications businesses across a broad spectrum of enterprise and vertical markets. Greater bandwidth, faster speed, and lower latency allow the creation of new types of applications from remote surgery, to autonomous vehicles, to smart factories. Network slicing makes it possible to serve multiple […]]]>

The evolution to 5G will open a world of new opportunities for telecommunications businesses across a broad spectrum of enterprise and vertical markets. Greater bandwidth, faster speed, and lower latency allow the creation of new types of applications from remote surgery, to autonomous vehicles, to smart factories. Network slicing makes it possible to serve multiple customers or markets through a single 5G network. Deployed in tandem with workloads operating on Multi-Access Edge Compute (MEC), 5G will play a foundational role in a new era of computing architecture. It’s an exciting time to be in telecommunications—but to capitalize fully, operators will need to maximize the speed and agility of 5G deployment and operations at scale. That makes automation a mission-critical capability for the modern telecommunications business.

Meeting the operational needs of 5G

Supporting 5G poses new operational challenges in two ways. To begin with, network complexity grows significantly due to the increased cell density needed to cover a given area—far beyond 3G and 4G—as well as the increased presence of workloads delivered at the edge as needed to meet the demand for ultra-low-latency services. Operating 5G networks at scale, and managing them effectively, simply isn’t feasible with traditional methods. This is especially true given the time pressures at hand. Facing fierce competition for new market opportunities, as well as unpredictable shifts in customer requirements and demand, operators need the agility to pivot quickly while maintaining service quality and cost control.

The rise of everything-as-software networking offers a more flexible way forward. By using cloud network functions (CNF) to deliver network functionality via containers, operators can escape the constraints of legacy hardware, while software-defined networks (SDN) allow them to orchestrate network services through centralized, programmable control. This makes it possible to adjust network parameters more easily to meet the requirements of new services and the requirements of specific customers. But software-driven networking is only part of the solution; intelligent automation and AIOps are necessary to complete it.

Advanced technologies such as artificial intelligence and machine learning, complemented by initiatives such as ONAP seeking to drive fully autonomous networking, are now ushering in a new era of network automation. By using analytics to both augment and increasingly replace human decision processes, operators can manage complex 5G networks and ensure service quality more efficiently at scale. The ability to fully automate network orchestration makes it possible to respond more quickly to customer requests while maintaining control and governance. Using AIOps to improve the detection and correction of network issues automatically, before they impact service quality, enables operators to deliver at scale, meet customer expectations and build stronger relationships for their 5G business model.

Leveraging the agility and elasticity of the cloud

Cloud resources are the key to agile and elastic 5G deployment. By tapping into compute and storage resources as a service, operators can adapt quickly and cost-efficiently to shifts in customer demand. The fast, frictionless scalability offered by the cloud also helps operators accommodate the vast amounts of data needed to make effective use of AI-powered network automation. Moving customer workloads closer to the consumer will reduce latency for 5G-enabled enterprise and vertical industry applications. Migrating OSS/BSS systems to the cloud can lower costs while increasing agility.

At the same time, growing cloud operations can also increase management complexity as well as the risk of security breaches and a growing regulatory burden. To take full advantage of the agility and elasticity of the cloud, operators need to be able to migrate data to and across environments easily and securely, optimize the placement of cloud-based workloads for performance and cost, ensure business continuity, maintain data privacy, and ensure the integrity of business-critical data. Here again, intelligent automation and AIOps are critical for addressing key aspects of 5G networking. Automated infrastructure management can simplify cloud operations and accelerate cloud migration. Automated cloud remediation and compliance enable operators to detect and close security gaps and prevent regulatory lapses across constantly-changing cloud environments and applications.

Enabling new enterprise and vertical markets

While 5G networks offer benefits for every type of customer and use case, the core business case for many operators centers on the opportunity to enter new business markets. 5G network slicing makes it possible to deliver private corporate networks to multiple enterprise customers through a single 5G infrastructure. MEC computing and ultra-low latency applications enabled by 5G allow operators to address use cases across industries including healthcare, agriculture, manufacturing, mining, transportation—the list goes on.

As operators work to deliver more advanced services and solutions for customers, agility continues to be a priority. Adopting DevOps concepts, embracing continuous delivery, and moving away from slower-paced waterfall processes and less frequent release schedules can help operators meet the demands of fast-moving markets, but it also calls for an equally agile approach to application testing. By automating the execution of regression, performance, and other tests as part of the CI/CD pipeline, operators can deliver higher-quality software more quickly and efficiently to capitalize on emerging opportunities.

In my next blog, I’ll discuss the evolution of Service Assurance for the 5G era and how operators can address the challenges posed by rising data volumes, siloed ticketing systems, complex ecosystems, and slow, fragile integration methods through use of AIOps and Intelligent Automation.

]]>