Decisions and Actionable Data for IT Operations
Everyone in IT knows that data makes all the difference in making key decisions. Simply having the data is one thing. But what about having the actionable data you need right in front of you any time you need it?
Managing IT services has gotten so complex that data streams of performance and availability metrics now span an ever-increasing, widely-distributed architecture—through private and public clouds, data centers, remote sites, mobile users, and even “things.” There’s no secret about how to address this data volume. The right data must be in the hands of those who make critical decisions about the data and the faster, the better.
What types of data are the most useful?
In order to make critical decisions quickly, consider the following types of data:
One of the most powerful types of data for actionable insight is the ability see how measurements change over time. You’ll want to know if there’s a trend upward or downward to take action before a condition impacts your end users. With time series data, you can also “learn” what is considered normal over certain ranges of time and then alert when the measurement goes outside of that range.
Event data can be very powerful, especially when you do a bit of up-front planning to build some intelligence into what triggers an event. Defining and configuring what defines an event in your environment helps you to fix problems as quick as possible without having to address a whole bunch of noise from alerts and notifications that clog your systems temporarily and no one is impacted.
Quickly seeing a summary list of resources that are consuming the most bandwidth, CPU processing, or memory can help you to prioritize what you want to address without having to wait for a problem to occur. Summaries can also be hierarchical so that you could have regional or functional data grouped to make accessing what’s actionable that much easier. For example, you could have your West and East regions in two separate groups and then within each group, you could have functional groups, such as web server devices, databases, and app servers.
Having the right actionable data helps you to be proactive when you’re troubleshooting any IT operations issue. For example, if you see that memory is rising at a steady pace for a specific host in your environment, you can determine if it’s the application that might have a memory leak or that there’s simply an increase in the number of users for an application. You can then determine if you should fix the application or add more resources.
In another case, suppose you have five web servers that serve critical web application pages to your customers. If one of those goes down and you’re notified of the event, the traffic load will most likely spike to higher volumes on one or more of the other web servers. Once you have that data, you can begin making decisions about redistributing the load as well as addressing getting the web server that went down back up again.
Introducing TrueSight Operations Management Dashboards
Using TrueSight Operations Management, you can configure the kind of dashboards that provide actionable insight from the data you collect throughout the network for all the examples just described. Whether it’s time series, event, or summary data, you can build what’s needed with our modern IT operations dashboards.
Here’s an example of a dashboard available in our latest release of TrueSight Operations Management 10.1
This event and device dashboard provides you a list view for events showing details on event severity, when it occurred, the source (device or application) and the event message.
From this actionable data, you can take multiple actions for any events including assigning the event or running probable-cause analysis or even launching remote actions on the particular device.
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.