In parts 1 and 2 of this series, we discussed the cultural challenges of operating IT in organizations with technology silos and considered the true role data centers play. In this final series post, we’ll look at how to realign monitoring operations to better support business goals.
The emerging perspective on monitoring emphasizes the application. By that, I don’t mean simply application performance monitoring (APM). It’s true that APM is useful in that transaction level tests can be issued to prove that a given application is able to authenticate users, accept input, and return some kind of output in a timely manner. Even better, though, is a view of technology silos as a monitoring whole.
This new point of view on monitoring encompasses the application and all its dependencies displayed in a related manner. While APM’s transactional monitoring is useful in detecting that a problem exists, it fails to identify what the specific problem is. It’s important to also tie in the physical infrastructure as an indicator of performance and availability.
Too Many Silos Block the View
For example, let’s take a look at a classic, multi-tiered web architecture. Several components must all work well in order for the web application to function well:
- A DNS service
- An HTTP engine, possibly with an SSL service
- A load balancer and/or a clustering service
- An authentication engine
- Possibly a java engine
- A database, possibly clustered
- A hypervisor
- One or more network switches
- One or more physical x86 hosts
- A storage array
All of these components form an abbreviated list of key hardware and software that must work together to deliver a multi-tiered web application. If we were to add “internet-facing” as a role for this web app, we could also add proxy servers, firewalls, and traffic inspection engines to the mix.
As we review the list, it’s notable that these components are spread over many different IT silos. The hypervisor, DNS, HTTP, and database services were probably installed by a sysadmin. The HTTP application was likely created by a software developer, while the database itself is managed by a DBA. The storage arrays are likely administered by a storage team. The switches are managed by networking specialists.
A Shift in Perspective
Despite the specialization, individual IT services are not really interesting to the business; the business cares only whether the web application can or cannot be used. Given this, the most logical way to monitor the data center that houses all this infrastructure is from the perspective of the application. Monitoring tools that do this well are not infrastructure-focused; rather, they gather the data from the infrastructure, and help IT staff to understand how individual, under-performing infrastructure components impact the user experience.
For example, it’s not enough to know that DNS is offline or returning unexpected results. A failing DNS service renders the web application inaccessible to users, at least once name-resolution caches time out. It’s not enough to know that Microsoft AD is sporadically failing to authenticate users due to high CPU on a certain domain controller—it’s more critical to know that some users won’t be able log into the web application as a result. It’s not enough to know that the database’s storage array is seeing an unusually high number of requests and that disk-write latency is climbing; the more important information is that web transactions will be very slow for users as a result. A good monitoring solution will help an IT team build the needed dependency trees and correlate data in this way.
IT Takes Teamwork
When IT silos start looking at infrastructure from an application-delivery point of view, they begin to interact in a different way. Instead of just defending their little corner of the data center, all teams start to work together for the common goal of delivering business applications to users—not merely keeping their isolated bit of infrastructure running. I hope I’ve made the point that no bit of infrastructure is isolated, not in reality.
When all share the same view of the monitoring system and related application views, problems don’t stay stuck inside the silos. Everything works better when all teams drill into problems, impacts, and resolutions together. For complex problems with several different impacts, this inter-silo communication and application-centric point of view is key to quick problem identification and resolution. The right monitoring system is the catalyst to make that happen.