icon_CloudMgmt icon_DollarSign icon_Globe icon_ITAuto icon_ITOps icon_ITSMgmt icon_Mainframe icon_MyIT icon_Ribbon icon_Star icon_User icon_Users icon_VideoPlay icon_Workload icon_caution icon_close s-chevronLeft s-chevronRight s-chevronThinRight s-chevronThinRight s-chevronThinLeft s-chevronThinLeft s-trophy s-chevronDown

Operational resilience: What it is and solutions for how to build it

Develop a strategy to ensure your business withstands and recovers from disruptions. Gain the insights and tools needed to identify threats, minimize impact, and maintain continuous operations for enhanced resilience.

Operational resilience ensures your business can withstand and quickly recover from disruptions by integrating multiple disciplines, teams, and tools. Learn how operational resilience can help you identify potential disruptions, minimize their impact, and maintain business continuity.

What is operational resilience?

Operational resilience is a company’s ability to withstand and recover from moments of business disruption. It ties together multiple disciplines, teams, processes, and tools. It encompasses a wide range of practices— — from identifying and preventing potential business disruptions, to limiting the impact of disruptions when they do occur, to remediating and recovering from them as quickly as possible. In short: iIt offers a holistic approach to keeping a business up -and -running no matter what happens.

Why is operational resilience important

It isn’t enough to define operational resilience. It’s important to know why it matters.

In short : Operational resilience risk has become everyone’s problem.

Every business now runs on digital functions and data, and must keep them available to maintain a minimal viable business. For example, a car manufacturer needs to know what parts they have in stock, and a bank needs to clear transactions within a certain window of time. Without these core functions and data, neither can conduct its business—and developing operational resilience keeps these core functions and data available.

Operational resilience is becoming more important every year. Instability and risk continue to increase in size and volume. Businesses face disruptions from a growing number of sources—from environmental and geopolitical events to cybersecurity attacks (especially ransomware incidents). While these events are impossible to predict, it is possible to develop operational risk resilience strategies to withstand and recover from them competently.

Damage to a business’ bottom line, eroding customer trust, and regulatory compliance failures are some of the problems caused by a lack of resilient operations. Investing in operational resilience today can help prevent these from happening in the future, making operational risk and resilience high-priority topics that every organization must care about.

How does operational resilience contribute to business value?

Operational resilience does more than just prevent problems—it can also help a company accelerate its innovation and transformation.

If a company knows that making changes won’t break its business and that its fundamental risks are covered, then it will produce more code, adopt more digital services and tools, and rapidly increase its customer base without fear of downtime.

Operational resilience also saves money over time by avoiding fines, disruptions to business, and lost revenue, which can be reinvested into growth programs.

While operational resilience can seem like an entirely reactive practice, it lays a secure groundwork for more proactive and aggressive change.

Who is responsible for operational resilience?

Operational resilience is a collaborative practice that touches many teams, leaders, and functional areas within an organization, including:

  • Service awareness and visibility, led by the head of infrastructure services
  • Risk management, led by IT and cyber risk management teams, and outsourcing and third-party risk management teams.
  • Business continuity management, led by the head of business continuity
  • Incident management, led by the security operations center (SOC) and head of operations
  • Governance, led by head of IT compliance, legal teams, and procurement teams

Ultimately, responsibility falls on IT and cyber security teams, and their leaders— — the chief information officer (CIO), chief operating officer (COO), and chief security officer (CSO), who report on the topic to the rest of the C-suite and the board.

Operational resilience vs business continuity

Operational resilience and business continuity are not the same thing. Business continuity is one element within of operational resilience, but operational resilience encompasses a much broader set of practices and concerns.

Operational resilience vs disaster recovery

Operational resilience and disaster recovery are also two different things.

Disaster recovery mitigates the impact of physical disasters (like an earthquake taking out a business’ physical mainframe) or certain cyberattacks. In most disaster recovery plans, an organization may have two mainframes (or other data centers) that are completely interconnected and live. If one is taken offline by a disaster, they it can switch to the redundant system and continue with the same data and processes, uninterrupted.

This does not solve modern cyberattacks like ransomware that can infect multiple mainframes or data centers at the same time. Operational resilience solves this problem by introducing additional measures like a third, air-gapped mainframe or data center that’s separated from the first two and won't receive malicious code. While disaster recovery is still necessary, it is no longer sufficient on its own to maintain operations.

Build a DORA-ready operational resilience plan.

Contact us

What are the 5 pillars of operational resilience?

Operational resilience is a flexible, holistic practice that can incorporate many activities. Yet most practices are built around the following operational resilience pillars.

  1. Identify. Understanding risk, the interconnection between systems, likely sources of disruption, and what happens if they occur.
  2. Detect. Establishing safeguards and defenses to notice disruptions in real- time, and to understand their potential impact.
  3. Protect. Mitigating disruptions to prevent their spread and reduce their impact, including limiting their potential impact before they occur.
  4. Respond. Performing well-defined response mechanisms and protocols to eliminate the disruption and evict any threats as quickly as possible.
  5. Recover. Restoringe data, functions, and normal business operations, and learning from the incident to prevent it from happening again.

These are not the only potential pillars of operational resilience. It is also possible to create a more granular understanding, to define common causes for failure for each pillar of operational resilience, and to overall make the concept as actionable as possible. Yet these five provide an efficient understanding of how to bring the concept to life.

Further, they can provide a high-level way to perform operational resilience mapping and, operational resilience gap analysis, and to see where a business might need to invest in their its operational resilience program.

Which operational resilience framework should I follow?

There is no single framework for building operationals resilience. Any framework that improves an organization’s ability to withstand and recover from disruption will help their operational resilience management.

A few common and effective frameworks to consider include the following:

  • Components of operational resilience

    For a simple operational resilience framework example, consider that systems must be:

    1. Stable with continuous uptime, minimal downtime, and consistent performance for critical operations.
    2. Stable with continuous uptime, minimal downtime, and consistent performance for critical operations.
    3. Recoverable, with robust, validated recovery processes, failover mechanisms, tested backups, and rapid restoration
  • The Basel principles for operational resilience

    The Basel Committee on Banking Supervision (BCBS) published 7 seven principles for operational resilience, which. The BCBS principles for operational resilience can form a basis for any operational resilience strategy:

    1. Operational resilience governance. To establish, oversee, and implement their an approach to operational resilience and each operational resilience policy
    2. Operational risk management. To identify threats and potential points of failure, including through vulnerability management
    3. Business continuity planning and testing. To conduct exercises to test their ability to continue operations during plausible disruptions
    4. Mapping interconnections and interdependencies. To understand what systems are needed to deliver critical operations
    5. Third-party dependency management. To understand how outside providers and members of their supply chain are needed to deliver critical operations
    6. Incident management. To develop response and recovery plans to mitigate security and operational incidents that might occur
    7. Information and communication technology (ICT) including cyber security. To ensure resilient technology systems needed to deliver critical operations
  • Common practices and frameworks

    Most established approaches to cyber security are helpful for building operational resilience. These include:

    1. NIST. The cybersecurity framework recommended by The National Institute of Standards and Technology (NIST) provides an effective operational resilience template with up-to-date practices and capabilities for modern threats.
    2. Zero trust. The practice of limiting access and reducing interconnection within technology systems to the bare minimum reduces the chance of suffering a disruption, and minimizes the potential damage any incident can cause
    3. Operational resilience training. Providing tTraining to team members directly involved with operational resilience, as well as — and everyone else who might be touched by a disruption, — canhelps maintain proper practice and cool heads during an incident
    4. Operational resilience scenario testing. Identifying potential disruption scenarios, attaching operational resilience metrics to performance, and regularly performing operational resilience testing can help an organization find and eliminate weak spots before they areit is compromised

    NIST in particular provides an operational resilience plan with detailed and practical operational resilience components. It can be seen as a comprehensive operational resilience program, and provide a quick framework for an operational resilience self- assessment.

Understand the impact of DORA on your operations.

Explore the implications

Operational resilience in banking and financial services

Operational resilience is important for organizations in every industry. Everyone needs to be able to maintain their core digital business functions and data at all times.

However, operational resilience is particularly important in banking and financial services, — which includes traditional banks, bitcoin exchanges, bettering exchanges, and any institutionone else that handles money in a digital state. These organizations must be able to keep up-to-date account balances, move money, and clear accounts within a specific time period (generally by close of business each day).

Operational resilience in financial services is important. If a banking or financial company fails to maintain its services, it will lose business, customers, and customer trust, and face steep fines associated with new and increasing regulatory requirements.

What is the Digital Operational Resilience Act (DORA)?

The Digital Operational Resilience Act (DORA) is a comprehensive EU regulation. It’s designed to improve operational resilience, standardize incident response protocols, provide guidance for risk management and mitigation, and help organizations avoid operational disruptions that lead to financial penalties and reputational risk.

DORA focuses on digital operational resilience testing, incident reporting, ICT risk management, ICT third-party risk management, and information sharing. It requires certain organizations to prove their ability to maintain operational resilience throughout a cyberattack or similar incident.

A few requirements in DORA include:

  • Resolving cyberattack incidents promptly and quickly
  • Deploying “cyber vault” technology to create physically and logically separated immutable backups
  • Performing vulnerability scans and assessments to uncover and address potential ICT vulnerabilities
  • Recovering critical operations within two hours of disruption, and completing end-of-day payments and procedures on time
  • Testing recovery procedures annually

While DORA applies to mainframe owners at EU-related financial institutions, every organization needs to pay attention to it, for two reasons:

  • It applies to financial institutions outside of the EU that do business within the EU (which is effectively every institution)
  • It is a sign of things to come, and new regulations will likely be drafted that set the same requirements for a broader range of organizations

Operational resilience tools

Your level of operational resilience depends largely on your tools.

To start with, you must have three levels of redundancy for how and where you store your data. This is discussed above. You must have your primary working data centers, a logically separated data center, and a physically separated data center that builds resilience against ransomware attacks.

Going deeper, there are three core areas where the right operational resilience solutions will make or break your ability to maintain business functions. They are:

  1. Code and development. Tools that help you create resilient code.
  2. Data. Tools that make sure your code sits on top of resilient systems.
  3. Security. Tools that defend your systems against external and internal threats.

At BMC, every tool within our portfolio builds operational resilience within these areas. Highlights of our operational resilience software include:

  1. Our developer experience (DevX portfolio) tools help you quickly produce resilient code that works, cope with high volumes of base changes entering your environment, and anticipate and avoid code-level risks before you push them into production.

    Learn more about BMC AMI DevX

  2. Our data portfolio tools ensure your databases are clean, recover data if there is a problem, and keep multiple data centers synced yet separate and secure to prevent corruption in one from spreading to others.

    Explore BMC AMI Mainframe Security
  3. Our security portfolio tools keep your systems and data safe from cyber threats, accelerate and automate both threat detection and remediation, and establish and maintain continuous compliance across the entire estate.

    Check out BMC AMI Data

We’ll help you run your business as you reinvent it

contact-sales

We know you have a lot to juggle, so we’ll get back to you as soon as possible. The more you can tell us about your unique business needs, the faster we can guide you to the right solution.

Whether you’re in the early stages of product research, evaluating competitive solutions, or just trying to scope your needs to begin a project, we’re ready to help you get the information you need.

BMC has helped many of the world’s largest businesses automate and optimize their IT environments. Let’s put that experience to work for your organization.