I’ve talked with many customers over the years about various data collection strategies as well as best practices for implementing data collection and monitoring using BMC Proactive Performance Management (BPPM). There is no doubt that many customers and organizations are at different levels of maturity across technologies cycles in their IT infrastructure. Lets face it; monitoring is not sexy, it’s a necessity. Customers must ensure that their key IT assets are up and functioning and they need to be aware when they are not. Most monitoring solutions today use basic thresholds and when a threshold, for example a CPU metric, exceeds this threshold it triggers an event to notify you that a breach has occurred. But the problem with this is that your getting notified ‘after’ the problem has happened. Another thing that typically happens in the NOC is that the number of events become so large that they are hard to manage. Determining which event to work on first and if/how it is impacting a business service becomes impossible. Unfortunately this is a pretty common scenario in today’s infrastructure. We will talk more about this in upcoming blogs.
Now there is a better alternative to setting manual thresholds and that’s using the dynamic baselining capabilities in BPPM. Dynamic baselining essentially takes the KPI’s (key performance indicator) of you’re your most important metrics and creates three baselines to determine the normal behavior based on the time of day. These baselines use hourly, daily and weekly data to compute the bands of normalcy.
One of the key benefits of using dynamic baselining is that it reduces the administrative burden, since the system derives your thresholds automatically. Another benefit is that you can reduce the number of events that get sent to the console in first place. No more guessing on where to set thresholds – letting the system do it dynamically drastically reduces the number of events. Now with baselines set, you will get abnormality notifications when a metric is above or below this band of normalcy. This notification can be used as an early warning or a proactive notification that something is trending in a positive or negative direction.
Example: You are monitoring a business service and all of its IT infrastructure components including, Network, DB, App, and Web Layers. You’re also using a response time metric from a Synthetic or Real User Experience monitor. BPPM will baseline all KPI’s of the infrastructure components as well as the response time metric. BPPM will proactively notify you if the response time either goes above or below a baseline, a percent (%) above the baseline, or above a static threshold in conjunction with a baseline simultaneously. This is typically used to get an early warning of increasing response time, before any SLA/SLO’s are breached, or an application experiences a slowdown.
Using this technology can help you lower the mean time between failures (MTBF) and reduce your administrative burden. Don’t be afraid to use technology to help you mature as an organization, to get control of your monitoring and to be aware of problems – before your customers call.
More to come…