In this five-part blog series, members of the BMC OnDemand organization will share their perspectives on the five key tenets that guide the way we run BMC software in the cloud for our customers. Nandu Mahadevan, who leads BMC’s OnDemand organization, provides his views on the fifth tenet: Infrastructure Resiliency.
Tenet Five: Infrastructure Resiliency
A good cloud foundation eliminates single points of failure and enables scaling across the infrastructure stack. In this blog, I include Architecture, Database, Server, Storage, and Network components under the umbrella of “Infrastructure”. To me, Resiliency directly translates to ensuring a good night’s sleep! If we look at the evolution of the motor car industry, the invention of power steering, automatic gear box, rear-cameras, anti-lock breaks, traction control, etc., we have continued to allow lesser skilled drivers to get on roads without getting into crashes. Intelligent and autonomous cars are already here! When it comes to designing for Software-as-a-Service clouds, Resiliency is needed to absorb any unpredictable usage patterns of end-users that could lead to a ‘crash’. Does the architecture support multi-threading, clustering, and scaling? Are the users facing web components stateless? Can the components handle multiple tenants?
How does one identify and improve upon resiliency gaps in a steady-state software cloud? Tip: Go after the raw events. In other words, formalize ‘Event Management’ as a practice. Most organizations tend to focus on Incident Management and Problem Management and don’t think much of Event Management. Most events are usually discarded as “noise”. To the contrary, I’ve found that it’s that analysis of “noise” that gives insight and opens up opportunities to build resiliency in the environment. You will discover single points of failures, as well as weakest links in your stack that cause availability and performance bottlenecks. I’m fortunate that Remedy OnDemand architects work closely with Product Management architects and are empowered to sign off on product release gate exits. This interlock helps ensure that product architecture continues to evolve to support high resiliency.
In case you missed them, here are links to the first four Tenets for high-performance IT operations.