Dealing with the ongoing massive volume of events that arrive daily can be tedious, error-prone, and costly. Just ask “Jerry.” He’s an operator who comes to work each day at 8 a.m., carrying his coffee cup that only gets washed every few days. After all, who has the time to clean a coffee cup when being bombarded by nonstop events that need to be handled ASAP? “Jerry” — like many other support staff team members who must keep up with the unrelenting flood of events — plops down in front of the operator console and looks at systems and network events, day-in and day-out.
For about 90 percent of those events, his main task is to “push next.” Because he can only work at human speed at a job that requires machine speed to keep up with digital business, he sometimes forgets to create a trouble ticket or follows the wrong resolution procedure. Yet who could blame him with that much repetition and volume in his workday?
Wouldn’t it be great to take those routine tasks out of the operators’ hands and automate the data collection, ticketing, and resolution actions so that they could focus on more strategic activities? It’s possible. That’s what BMC Atrium Orchestrator does with its event triage and remediation capabilities.
Trouble ticket nirvana
Here’s how the process works. When an event comes in, it must be classified to help narrow down the potential resolution tasks. Next, additional information might need to be collected to fill in some of the gaps that the event didn’t provide. Now someone can prioritize the event and the actions (or lack thereof) needed to resolve the issue. A service desk ticket should be created to provide tracking for the event. The ticket would be routed to someone, perhaps even the operator who noticed the ticket, so that action can be taken. Finally, once the issue is resolved, the ticket should be closed to provide the complete tracking.
Keep in mind that not all events are created equal. So, it’s important to identify which ones need immediate attention to avoid disrupting the flow of digital business, and which ones are simply considered to be “noise.” BMC Atrium Orchestrator helps automate the problem resolution by automatically enriching events with information necessary to provide event triage and proper classification. It can prioritize the events based on policy, automatically open the trouble ticket, initiate the automated remediation steps, verify the event is resolved and close the ticket. This happens all without operator interaction. It speeds up troubleshooting and ensures rapid remediation, so staff members don’t need to spend many hours repeatedly performing routine tasks. Plus, this allows IT to reduce support costs, errors, and delays while also improving the mean-time-to-resolution (MTTR) of incidents and reducing event volumes. (For more details, check out this white paper).
Event triage and remediation by the numbers
Here’s an example of what you can save by automating this process. Manual processes for detection, validation, resolution, and verification of a fault can typically take 90 minutes or more. Our automated approach can get the process down to 4 minutes or less and reduce the risk of human errors. And that’s not all. By using prepackaged workflows, you can reduce — or even eliminate — the effort and time involved in dealing with the issues caused by high-volume events.
This technology can decrease costs dramatically. For instance, a large managed cloud provider that uses event triage and remediation from BMC handles three to four times more events each month with the same headcount and has reduced the number of tickets by 6,000 a month. In fact, the company estimates this saves about $2 million a year.
More time to innovate
With automated event triage and remediation, work can become a lot more interesting for “Jerry” and other staff members. They can let BMC Atrium Orchestrator handle the more mundane, repetitive functions so that they can be freed up to work on more strategic activities and projects, like securing the fast-growing environment. Plus, the director of IT can meet business objectives because the number of errors has decreased, it’s easier to scale faster to support business growth, and employee morale is so much better when the work is more interesting and challenging. And, according to “Jerry,” there’s now even time to wake up, smell the coffee, and drink it in a clean cup each day.
To learn more about what event triage and remediation from BMC can do for your organization, be sure to read the white paper: Event Triage and Remediation Management. It also has a checklist to help you determine how to get started on reducing the time it takes to resolve network and system events.
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.