WHITE PAPER
![]()
BMC® Performance Manager: A Foundation for Business Service Management Introduction
Challenge of Managing Distributed Environments
Scenario: Before Business Service Management
Scenario: After Business Service Management
Incident detection and isolation is improved
Ability to prevent future incidents
Benefits of BSM
Business impact of an alert is known
Root cause of an alert is isolated faster
BMC Performance Manager used to customize monitoring
Knowledge from prior incidents used to reduce length and duration of future outages
Ability to monitor service levels
Ability to use BMC Performance Manager recovery actions to avoid outages
Business Service Management
BSM Routes to Value: Paths to Business Service Management
BMC Infrastructure and Application Management Route to Value
BMC Infrastructure and Application Management Route to Value productsBMC Infrastructure and Application Management Route to Value products
Conclusion
Introduction
In the data center of a large corporation, it’s the end of the quarter. The systems operator receives an alert: a Windows server is experiencing excessive paging. Within minutes, the operator responsible for the sales automation application receives an alert: the application is terminating. Simultaneously, the Help Desk receives frantic calls from end-users who can’t process orders.
In this scenario, it is difficult for the IT staff to determine the relationship between the excessive paging alerts and the sales application alert. More importantly, the business impact of the alerts becomes apparent only when the end-users contact the help desk.
This paper shows the need for a structured methodology, known as Business Service Management (BSM), which activates IT organizations to manage technology from a business perspective. Many organizations require a BSM roadmap. This roadmap helps them maximize their ROI in the short term but also helps them plan a long-term strategy to improve business performance, simplify the complexity of the IT infrastructure, and reduce costs.
BSM Routes to ValueTM provides the roadmap with a comprehensive set of BSM-enabling solutions and a suggested implementation methodology. BSM Routes to Value extend across all BMC Software® products, permitting them to work together and leverage each other’s capabilities. As a result, organizations can implement solutions in an incremental manner as they make the transition to full Business Service Management.
This paper addresses the BMC Infrastructure and Application Management Route to Value. Specifically, it shows how BMC® Performance Manager, BMC® Service Impact Manager (BMC SIM), and BMC® Remedy Help Desk can be used together to help IT understand the business impact of a system alert, quickly troubleshoot the alert, and then correct the problem. A before-and-after scenario shows the benefits of using this set of BSM-enabling products.
Challenge of Managing Distributed Environments
In the past three decades, companies have spent millions of dollars on systems management and availability tools. Yet, in many cases end-users are still the first to report IT-related problems. Why is this the case? Mainly, IT lacks the tools needed to quickly determine the business impact of an alert and isolate the root cause.
Scenario: Before Business Service Management
Consider the scenario in which IT is hampered by this shortfall, a scenario in which BSM is lacking.
Figure 1 Before Scenario without BSM
Incident occurs
In the data center of a large corporation, it’s the end of the quarter. As shown in Figure 1, the server administrator receives an alert: a Windows server is experiencing excessive paging. Within minutes, the help desk operator is notified that the sales automation service has terminated. Simultaneously, the help desk receives calls from end-users who can’t process orders.
Incident diagnosed
In this scenario, the server administrator doesn’t know the business impact of the excessive paging alert. It might or might not be addressed, depending on what else the operator has to do. However, the excessive paging problem is a symptom of a larger problem: the server has insufficient memory. The memory shortage is causing excessive paging and impacting disk performance. If the sales automation process is being monitored, it would show a high CPU usage due to excessive paging.
After being notified that the sales automation application is unavailable, the help desk operator contacts everyone in the IT infrastructure who might be responsible. These contacts include the various operational silos: the network administrator, the database administrator, the server administrator, and the application administrator. Each administrator troubleshoots the problem separately. Because a help desk staff often works in crisis mode, they don’t have time for the detailed analysis required to correlate events. Hence, they don’t know that the termination of the sales automation application is related to the excessive paging alert. If the help desk staff was able to make that connection, they could isolate the problem much faster. Instead of contacting multiple administrators, they could contact the administrator of the server that exhibited excessive paging.
The incident is eventually resolved, but not until the various administrators waste valuable time investigating a problem that is not theirs. In addition, because the relationship between the alerts is not captured, nothing is learned. When this scenario occurs again, the IT staff will not be able to resolve the incident any faster.
Scenario: After Business Service Management
Now consider the same scenario, but with BSM in place.
Assumptions
To achieve BSM, the IT staff is using BMC Performance Manager for Servers to monitor the Windows infrastructure and BMC Service Impact Manager (BMC SIM) to correlate events and project the business impact of alerts. This scenario deals with Windows alerts, but BMC SIM can also handle UNIX, Linux, or Mainframe alerts. To automate the help desk, the IT staff is using BMC Remedy Help Desk.
Incident occurs
In the data center of a large corporation, it’s the end of the quarter. The BMC Performance Manager parameter that monitors the percentage of the page file that is in use goes into alarm and generates an alert, which is displayed in the BMC Performance Manager console. BMC Performance Manager also forwards this alert to BMC SIM. BMC SIM uses customer defined rules to model how events in the IT infrastructure impact the business services provided. Next, BMC Performance Manager, which is also configured to monitor the sales automation process, detects that this process is consuming high CPU. BMC Performance Manager generates another alert and sends it to BMC SIM. BMC SIM correlates this event to the excessive paging event and determines that an incident exists that affects the order fulfillment service. Because this service is defined in BMC SIM as a critical service, BMC SIM opens a help ticket in BMC Remedy Help Desk. The help ticket includes information about the BMC Performance Manager alerts and the business impact. The BMC Remedy Help Desk operator forwards the help ticket directly to the BMC Performance Manager administrator. For a graphical representation of this process, see Figure 2. .
Figure 2 After Scenario with BSM
Incident diagnosed
The server administrator uses the BMC Performance Manager console to diagnose the incident. First, the operator notices that the BMC Performance Manager Agent parameter that monitors disk activity is high. The operator determines that a lack of physical memory is causing the high paging and disk activity. During periods of peak usage, the lack of memory is causing the sales automation application to consume high CPU.
Incident resolved
The operator has physical memory added to the server and closes the Remedy Help Desk ticket. The Remedy Help Desk database contains information about the incident and the resolution, as well as outage statistics that the IT staff can use to calculate service levels.
Incident detection and isolation is improved
The advantages of BMC Performance Manager continue. The server administrator is aware that BMC Performance Manager can be customized and decides to create a new parameter to help detect this problem in the future. The operator uses the PATROL Wizard for Microsoft Performance Monitor and WMI to add a Microsoft Performance Monitor counter to BMC Performance Manager that monitors the time required for disk transfer.
The operator also creates a new composite parameter that is a function of two existing parameters: page faults and disk activity. This new composite parameter indicates the percentage of disk time that is used for paging and is used to determine whether a memory shortage is impacting disk performance. Now, whenever this condition occurs, an alert is generated.
The next step is to incorporate this alert into the service model, so that BMC SIM is able to specify the business impact immediately when this alert occurs in the future.
Ability to prevent future incidents
Finally, the administrator could also use BMC Performance Manager to fix the incident automatically instead of sending an alert to BMC SIM. Although that solution might not be possible in this particular scenario, for other types of incidents the server administrator could create a recovery action and associate the recovery action to an event, such as a parameter value reaching a specified value. For example, the administrator could configure a recovery action that clears the temporary directory when free disk space reaches a specified value.
Benefits of BSM
Comparing these two scenarios illustrates the following benefits of using BSM and, specifically, BMC Performance Manager as the foundation for BSM:
- The business impact of the initial alert is known before end-users experience an incident.
- The root cause of the incident is isolated faster.
- The flexibility of BMC Performance Manager enables the operator to customize the monitoring based on actual experience.
- Information about past incidents can be used to calibrate the service model to help isolate future incidents faster.
- Remedy Help Desk incident statistics are used to measure compliance with service level agreements (SLAs).
- BMC Performance Manager recovery actions can potentially be implemented to automatically correct the incident and avoid future outages.
Business impact of an alert is known
SIM determined that the sales automation service was affected and knew immediately that the incident was critical. Without BMC SIM, the business impact was not known until the end-users experienced an incident and called the help desk. When the problem reaches the end-users, the company loses money because the end-users are unable to enter orders and book revenue until the incident is corrected.
Root cause of an alert is isolated faster
Without using BMC SIM to correlate events, the help desk operator, upon receiving an alert or a call from an end-user, is forced to contact everyone who might be responsible. This guess-work method of diagnosis is extremely inefficient. Meanwhile, every minute that the outage continues equals more lost revenue. With BMC Performance Manager and BMC SIM, the root of the incident can be diagnosed much faster, which saves money during every outage.
If multiple BMC Performance Manager alerts arrive that are related, BMC SIM creates one consolidated help ticket, not two separate help tickets. Similarly, when an event arrives from BMC Performance Manager and the help desk receives an outage report, the service model in BMC SIM may show a connection between these events, and generate one consolidated help ticket. This consolidation helps administrators isolate the incident. Instead of responding to multiple, apparently unrelated, help tickets, the administrator can focus on the root problem.
BMC Performance Manager also provides the detailed system information required to troubleshoot the incident quickly. Critical system parameters are monitored and administrators can view parameter history easily.
BMC Performance Manager used to customize monitoring
BMC Performance Manager can also be customized to fine-tune system monitoring, which also aids problem identification. Administrators learn from experience which data is important and can customize BMC Performance Manager accordingly. This ability is important because all IT alerts should be meaningful. Meaningless alerts make it more difficult to troubleshoot critical incidents. In the “before” scenario, system alerts are not filtered; all alerts reach the help desk or the system administrator with equal priority. With BMC Performance Manager, however, administrators can fine-tune what is monitored and the thresholds for monitoring, so that the help desk is not overwhelmed with low-priority alerts.
Using the PATROL Wizard for Microsoft Performance Monitor and WMI, Windows administrators can add any Microsoft Performance Monitor counter or WMI variable to BMC Performance Manager and define alarm thresholds for those parameters. In addition, they can create custom parameters that are a function of two or more existing parameters.
Knowledge from prior incidents used to reduce length and duration of future outages
Without BSM in place, the company is reactive and is unable to use the knowledge gained from an incident to prevent the incident from happening again or even to improve its response to future incidents. The ability to learn from prior incidents is not built into the system. Instead, companies must rely on the talent of individual IT staff members to correlate past incidents and to recognize patterns. In the “before” scenario, the help desk operator might not be able to recognize the same series of events in the future and forward the help ticket directly to the administrator. In fact, the help desk operator might not even be the same person who handled the incident previously.
However, with BSM in place, the ability to learn from past incidents is built into the system. The service models in BMC SIM can be fine-tuned as more information is learned about how problems in the IT infrastructure affect business services. In the “after” scenario, the service model was updated to handle alerts generated by the new BMC Performance Manager parameter. In addition, the Remedy Help Desk has a built-in Knowledge Base that provides ready access to common solutions, known errors, and workarounds to assist end-users in expediting incident resolution.
Ability to monitor service levels
The BMC Remedy Help Desk solution, fully integrated with SIM, documents any changes in the state of all incidents. Because of this integration, information captured in the help ticket can be used to manage service level agreements (SLAs). This integration ensures the proper tracking of incident response and resolution times and ensures that the data center is meeting its SLAs.
Ability to use BMC Performance Manager recovery actions to avoid outages
Without BSM, incident identification and resolution is a manual process. However, by using BMC Performance Manager as the foundation for BSM, companies can automate certain tasks, such as simple recovery actions, and avoid incidents that affect business services.
Business Service Management
Business Service Management (BSM) does more than align IT practices to the goals of the business, merge business with IT information, and ensure that IT is able to support business goals. BSM helps IT managers develop an understanding of the business requirements for IT services, while helping business managers to develop an understanding of how business impacts IT services. Certainly BSM empowers IT organizations to manage technology from a business perspective, but ultimately BSM enables IT organizations to activate business by improving business performance, simplifying the complexity of the IT infrastructure across the enterprise, and reducing costs.
BSM requires the integration of IT management processes across mainframe and distributed systems and across a variety of separate disciplines. Organizations differ greatly in their needs for BSM, depending on current business problems and resulting pain points. Consequently, organizations differ in which IT disciplines they need to focus on to gain competitive advantages. Focusing on disciplines that maximize immediate returns on investments sustains short-term gains, but over time, any IT discipline can deliver value and eventually lead to part of BSM.
BSM Routes to Value: Paths to Business Service Management
To meet the unique needs of each business, BMC Software has developed BSM Routes to Value—a coordinated, incremental approach to BSM. BSM Routes to Value are field-proven solutions that can be implemented independently but ultimately leverage one another to interconnect the related disciplines within BSM. This interconnection is facilitated by BMC Atrium, an open-architected set of enabling technologies that provide information sharing and centralized management across BMC Software and third-party solutions.
BMC Infrastructure and Application Management Route to Value
The BMC Infrastructure and Application Management Route to Value serves as a foundation for Business Service Management, enabling users to proactively manage and control all enterprise applications and underlying IT infrastructure components from a business service perspective. This foundation enables significant improvement in IT efficiency and cost reduction by controlling all aspects of infrastructure management with a common toolset across the enterprise: mainframe and distributed, online and batch. It enables IT to address potential problems before they jeopardize service levels by proactively monitoring vital components and addressing imminent problems. And, it gives IT the business-relevant information it needs to prioritize incidents and allocate resources effectively.
Figure 3. shows how the BMC Infrastructure and Application Management Route to Value is the foundation for BSM. This paper discussed a scenario in which Windows alerts were fed to the service model. But, as the figure shows, the service model can incorporate events from all segments of the IT infrastructure, including mainframes, databases, storage devices, middleware, and the network.
Figure 3 BMC Infrastructure and Application Management Route to Value
BMC Infrastructure and Application Management Route to Value products
The following products are discussed in this paper:
BMC Performance Manager for Servers—BMC Performance Manager for Servers automates monitoring and management, enabling administrators to handle the complex Windows server environment more effectively. BMC Performance Manager for Servers monitors many aspects of Windows servers such as services, processes, CPU, memory utilization, and disk space, providing alerts or recovery actions when problems are detected. BMC Performance Manager for Servers also ensures the connectivity, replication and overall health of Active Directory. BMC Performance Manager for Servers can be easily configured through wizards to monitor any process, event, Microsoft Performance Monitor counter, or log file.
BMC Service Impact Manager (BMC SIM)—BMC SIM enables BSM through real-time business-aware information about IT services and infrastructure. BMC SIM leverages existing management tools and processes events against service models that relate IT and the business. Business-aware service models enable IT to pinpoint root causes and prioritize business-critical problems. BMC SIM uses the BMC Configuration Management Database (CMDB) as an instrumented asset and configuration data source to create and maintain models. Common reporting and Web portal technologies deliver role-based dashboards and IT service impact reports.
BMC Remedy Help Desk—BMC Remedy Help Desk provides the foundation for an integrated, end-to-end approach to IT Service Management. Based on best practices, BMC Remedy Help Desk automates the ability to submit, monitor, and manage help desk cases, change tasks, and asset inventory records. It also indicates which business services are impacted by a given incident or problem, enabling users to determine priorities based on business need.
Conclusion
The scenarios in this paper demonstrate how BMC Performance Manager acts as a foundation for BSM and enables you to automate problem diagnosis and resolution. The historical data that BMC Performance Manager provides also enables the IT staff to perform the analysis necessary to link IT components to the business services they support. This knowledge empowers IT to create effective service models in BMC SIM. BMC Performance Manager is essential for both the implementation and the day-to-day operation of a BSM environment.
Helping you maintain advantage
BMC Software Education Services offers a strategic investment for your business, maximizing the value for your employees and Business Service Management initiatives. Education ensures successful product implementation, promoting mastery of all product capabilities and highest productivity with your BMC Software solutions. To explore our education offerings, visit our web page at http://www.bmc.com/bmceducation, or contact BMC Software Education Services by telephone or e-mail:
- North America
Telephone: 800 574 4262
E-mail: education@bmc.com- Asia Pacific
Telephone: +61 3 9657 4404
E-mail: ISD_AP@bmc.com- Europe, Middle East, and Africa (EMEA)
Telephone: 00800 26233822
E-mail: emea_education@bmc.com
About BMC Software
BMC Software, Inc. [NYSE:BMC], is a leading provider of enterprise management solutions that empower companies to manage their IT infrastructure from a business perspective. Delivering Business Service Management, BMC Software solutions span enterprise systems, applications, databases, and service management. Founded in 1980, BMC Software has offices worldwide and fiscal 2004 revenues of more than $1.4 billion. For more information about BMC Software, visit www.bmc.com.
Copyright 2005 BMC Software, Inc., as an unpublished work. All rights reserved.
BMC Software, the BMC Software logos, and all other BMC Software product or service names are registered trademarks or trademarks of BMC Software, Inc.
All other trademarks belong to their respective companies.
July 18, 2005
| 55996 |