WHITE PAPER

     Monitoring Microsoft Exchange with BMC® Performance Manager - A Best Practices Guide  

Introduction

Monitoring the Windows Operating System

CPU Usage

Memory

Disk

Network

Monitoring the Exchange Services

Monitoring the Exchange Processes

Monitoring the Message Flow

Information Store

Routing Engine

SMTP

Queues

Monitoring the E-Mail System Usage

Monitoring Top Consumers

Monitoring End-User Perspective Availability

Monitoring Data Access

Parameter Summary

Sources

Books

Online Books

Articles

Websites

Software

Online Help Systems

Helping You Maintain Advantage

Introduction

Mail and messaging applications are mission-critical tools in today’s business environments. Business productivity and effective communication require that these applications offer 24x7 availability and perform in real time. Microsoft Exchange is the leading collaboration tool offering mail and messaging capabilities. Now, more than ever, Exchange is being deployed in the most demanding environments, including large organizations with thousands of users.

For an administrator to guarantee availability and performance to these thousands of users, monitoring Exchange environments is key. Monitoring aids in the rapid detection of poor performance or failures, and problem resolution through the detailed information provided. Additionally, through proactive monitoring, potential problems can be predicted, service outages prevented, and service-level agreements met.

While every Microsoft Exchange environment is unique, there are certain practices that you can follow to ensure the overall availability of your Microsoft Exchange Environment. While reviewing this document, please keep in mind that every Exchange environment is different and that you may need to refine these BMC Software recommendations or take your own base lines to determine what is best for your environment.

While there are many components which may touch your Exchange environment, such as anti-virus software, backup software, internet information services, and so on, this paper discusses the critical components of a typical Exchange 2000 or Exchange 2003 environment that you should consider monitoring with BMC Performance Manager for Microsoft Exchanger Servers.

In addition, the information presented includes recommendations as to the thresholds you should set. This discussion focuses on the most basic areas to consider in monitoring Exchange, including:

Monitoring the Windows Operating System

The performance and availability of applications and databases are intimately tied to the operating systems that they run on. Without a healthy operating system, no application (such as Exchange) is going to perform as expected.

There are four primary areas related to the operating system that should be monitored:

For detailed explanations of application classes, their parameters, and their values, please see Best Practices for Monitoring Windows Server Systems with BMC Performance Manager paper.

CPU Usage

Monitoring this component is important because if CPU usage is too high, Exchange may perform slowly or not respond at all.

BMC Performance Manager application class:

NT_CPU

Thresholds:

Because this application class collects data related to the time that the processor is running, it is a good indicator of processor activity. Typically, if a processor demonstrates that it is being used 0%, you should be concerned that the services are down. On the other hand, if a processor presents data that shows it is being used at 90% or more, you should be concerned that the processor is being overworked and may shut down. Be sure to set an alarm for when a processor exceeds 90% for 5 minutes or longer to prevent service outages.

Memory

Memory plays a key role in server availability. Applications and users consume memory and sometimes make a server perform sluggishly. Monitoring memory use can help you isolate issues to improve performance before they become critical.

BMC Performance Manager application classes:

NT_MEMORY, NT_CACHE, NT_PAGEFILE

Thresholds:

In the NT_MEMORY class, pay special attention to the ratio of committed bytes to committed limit. You should configure BMC Performance Manager to trigger a notification if your virtual memory exceeds 80% to avoid potential service impacts.

In the NT_PAGEFILE class, be sure to pay attention to the PAGEpgUsagePercent parameter. Experts consider the paging file to be healthy with usage between 15-35%, but usage above 60% may indicate an issue, such as too little RAM or a memory leak. Windows automatically increase the size of the paging file once it reaches 90%, as this can be considered a critical situation. However, this causes negative impacts to performance. Thus, you should set an alarm between 60-80% to address any possible issues before they escalate.

Disk

One of the key resources that can affect a server’s availability is the disk in use. When the disks are spending the majority of its time reading or writing, or there is little free space, performance is degraded or a database may be dismounted. Monitoring the disks can help you ensure that you address possible issues before performance becomes degraded, or a store is dismounted and service is interrupted.

BMC Performance Manager application class:

NT_LOGICAL_DISKS, NT_PHYSICAL_DISKS

Thresholds:

Recommended thresholds for these application classes are discussed in detail in the Best Practices for Microsoft Windows Server System with BMC Performance Manager paper.

BMC Performance Manager Benefit:

The BMC Performance Manager for Servers product is pre-programed to trigger an alarm when disk space usage is between 69-100%.

BMC Performance Manager for Microsoft Exchanger Servers provides a parameter, FreeDBSpaceAvailForDefrag, which indicates how much disk space you have available for defragmentation, that can help you manage your disk space more efficiently. With this feature, you can choose your threshold and the product initiates an automated recovery action that defragments the disk and helps improve performance.

Tip:

Thresholds can be refined further for individual disks depending on the type of information it contains, such as log files, database files, or message queue files. Typically, database files grow slowly over time, so you should set a threshold of 10% for free disk space. In the case of the database or message queue files, these can grow rather quickly when mail cannot be sent or if backups are not working to remove old log files. Thus, you should set a lower threshold at about 40% free disk space.

Network

Regardless of how well your Exchange servers are working, if the network has issues, your Exchange servers may not be able to send or receive any information. Therefore, it is important to monitor network components, to understand what kind of impact they may have or are having on the performance and availability of your Exchange environment.

BMC Performance Manager application class:

NT_NETWORK

Thresholds:

In a properly functioning network, the number of errors either outbound or inbound should be zero. Be sure to set an alarm for 1 or beyond so that you are notified as soon as your network is compromised. In addition, be sure to monitor the NETniTotalBytesPerSecond parameter. This counter indicates whether you are experiencing a performance bottleneck. Depending on your environment, you should observe a threshold approximately between 8.75 - 11.25MB/sec. If you observe this metric to be increasing, collisions are probably occurring and network efficiency is decreasing, and you need to take action.

Monitoring the Exchange Services

Services are application types that run in the system background. Services provide core operating system features, such as Web serving, event logging, file serving, help and support, printing, cryptography, and error reporting. To provide core system features to users, Exchange provides a number of services. Of these services, the following components should be monitored:

BMC Performance Manager application class:

NT_SERVICES

Thresholds:

The two most important parameters to monitor for each component are ServiceStatus and SvcDown. The ServiceStatus parameter indicates whether the Exchange service has been started, and whether clients can make connections. The SvcDown parameter indicates the opposite.

BMC Performance Manager Benefit:

The BMC Performance Manager for Servers monitors all services out-of-the-box so there is little configuration necessary to achieve service monitoring. If any of the Exchange services enters a precarious state of between 1-5, you receive a warning. If the state should escalate to between 5-12, BMC Performance Manager is preconfigured to send an alarm, as this can be detrimental to the mail flow in your organization.

Monitoring the Exchange Processes

In addition to service monitoring, you should also monitor the processes that run on behalf of an application. Exchange has critical processes that should be watched to determine their availability and resource usage. Exchange processes include the following:

BMC Performance Manager application class:

NT_PROCESS

Thresholds:

The two most important parameters to consider are PROCDown and PROCProcessorTimePercent. If the PROCDown parameter indicates that a process is down or the PROCProcessorTimePercent is at zero, your processes may not be running, which is of course detrimental to mail flow. Inetinfo.exe, store.exe, emsmta.exe, and system.exe typically consume 90% of the processor time combined. However, if you observe these processes to be running at 100% for a sustained period of time, one of these processes has an issue that needs to be diagnosed and resolved. Set an alarm between 90-100% depending on your environment.

BMC Performance Manager Benefit:

The PROCDown parameters for these processes are preconfigured to alarm in the case where these values equal 1 indicating that the process is not running.

When used with BMC Performance Manager for Servers, BMC Performance Manager for Microsoft Exchanger Servers automatically monitors Exchange Server processes that are initiated at startup. You can also configure BMC Performance Manager to monitor additional Exchange Server processes where necessary.

Monitoring the Message Flow

To understand the performance of your Exchange systems, you should monitor parameters related to message flow. These allow you to understand the speed at which the Information Store, Routing Engine, SMTP, and Queues are processing requests and whether there is a bottleneck.

Information Store

Monitoring the Information Store is critical as this is the repository that Exchange uses to manage and process all of its information including mail, attachments, calender information, querying Active Directory to authenticate users, and so on.

BMC Performance Manager application classes:

MSEXCH_DB_Private, MSEXCH_DB_Public

Thresholds:

MSEXCH_DB_Public\RecvQueueSize and MSEXCH_DB_Private\RecvQueueSize parameters monitor Exchange server performance for user mailboxes and the public folders store, and tells you the number of messages in the receive queue waiting to be processed by the IS. If the parameter value is high or increasing, you may need to load balance your server or consider other issues. The same limits that apply to the SendQueueSize parameter apply to the RecvQueueSize parameter.

Tips:

If you suspect that the private or public send queue is experiencing a bottleneck, use one of the following actions to correct the problem:

BMC Performance Manager Benefit:

For the State parameter, the Exchange product has a preconfigured threshold of 2-3 when a warning is to be issued, and a threshold of 1 for an alarm.

The Exchange product is preconfigured to alarm at a threshold of 30 minutes or more for the AvgDeliverTime parameter.

Routing Engine

The function of the routing engine or message transfer agent is to deliver messages once the IS has searched the directory to determine the destination of a message. Thus, this component should be closely monitored, as any problems impact the rate or ability for clients to send and receive messages.

BMC Performance Manager application class:

MSEXCH_MTA

Thresholds:

The WorkQueueLength parameter displays the total number of messages currently in the MTA queue. This number includes inbound and outbound messages for the Information Store, the Directory, and any MTA connectors. If the QueueLength parameter is large or increasing, you have a problem that may impact performance. The cause could be:

The parameter value should remain less than 0.5 to 1.0 percent of connected users or vary somewhere between 0-50 typically. If the queue length is above these values for a sustained period of time or if the queue length is rising, you may have a problem with one of your Exchange components, a connector, or a remote Exchange MTA. Set an alarm for any value of 100 or greater.

Tips:

To improve performance, you can remove messages from the MTA queue that were generated by the directory service, system attendant, or the public Information Store. These messages often accumulate when a WAN link fails or when a server is offline, interfering with the delivery of user-generated messages.

If the queue is larger than normal, determine the destination server of the first message in the queue and then verify that the destination server or connector is configured properly. You may also want to search the application event log generated by the MTA connection or the destination server.

BMC Performance Manager Benefit:

If the MTA cannot contact a domain controller, it may frequently shut down. If it does, the Exchange product has a recovery action that automatically restarts the service for you.

SMTP

Although Exchange 5.5 used X.400 as the Internet protocol of choice, this protocol has since been subplanted by SMTP. Exchange 2003 now uses this protocol for all communication between Exchange servers within a site. Thus, monitoring its availability is critical to ensure message flow. In addition, queue traffic is a direct result of the clients in use.

BMC Performance Manager application class:

MSEXCH_SMTP_Server

Thresholds:

If you observer the values of the parameters in the SMTP application class increasing, specifically the LocalQueueLength, you should be concerned that messages are not being passed between Exchange servers. This suggests that a connector is not working properly or that there is a problem with the network.

Queues

As messages are passed from one process or component to another in the Exchange environment, these messages may be queued while waiting for the next process to perform its function (such as look up in Active Directory, or routing, and so on). Monitoring the number of messages in these queues is one of the most effective means of determining whether there is a message flow problem. While you may sometimes receive a “false positive” from a hang up in the queue due to a spike in the number of messages being sent or a large message in the queue, monitoring these queues for such events can help avert potential service problems.

BMC Performance Manager application class:

MSEXCH_Queues

Thresholds:

In the MSEXCH_Queues application class, focus on the State and IncreasingTime parameters. If you notice that the State is constantly in Frozen mode, your network is having delivery problems that need to be addressed. Be sure to set an alarm for when the state of this parameter changes to Frozen. Also, pay close attention to the IncreasingTime parameter. This parameter gives you a good indication if your performance is suboptimal due to problems with the Web Storage system or contacting the Domain Controller in the case of an Active Directory lookup. The Exchange product has a preprogrammed threshold of 10 unprocessed messages per minute or more for an alarm to trigger.

Tips:

Additional parameters that may help you determine message transfer rate and productivity are the MSEXCH_Sent_Mail\RecvBytes and MSEXCH_Sent_Mail\SentBytes parameters (the message tracking logs must be enabled for these parameters to be active). These parameters tell you the number of messages sent and received within a site, among sites in your organization, local to the server, and to external servers. You can use this data to balance server load and to ensure that you are using network bandwidth effectively. Furthermore, the MSEXCH_Sent_Mail_Containers\TotalMsgSent and MSEXCH_Sent_Mail_Containers\TotalMsgReceived parameters show you the sum of all the message traffic on a particular server on your network.

BMC Performance Manager Benefit:

The Exchange product discovers the queues being used on the server and generates an application class for each type of queue.

Monitoring the E-Mail System Usage

As with any server or application, the number of users or clients connecting to it creates increased load and resource consumption. Though these are not parameters that you need to monitor closely from day to day, from a planning perspective, you should monitor these parameters over time to determine whether your current infrastructure can continue to support your messaging needs.

BMC Performance Manager application class:

MSEXCH_IS

Thresholds:

Thresholds for UserConnects and ClientConnects should be set at zero, as this may be helpful for indicating possible problems with clients being able to connect.

Tip:

There are a couple IS-related Performance counters that are worthwhile to monitor. These are:

These two metrics help you determine problems with processing the client request before or after Exchange processing begins. A problem before Exchange processing and message flow exists if the RPC Requests are low and the RPC Operations/sec is zero. Anything other than this scenario indicates a problem during or after Exchange processing. You should set an alarm when the RPC Operations/sec drops below normal for a sustained period of time.

BMC Performance Manager Benefit:

You can use the Perfmon Wizard provided by the BMC Performance Manager for Servers product to bring in any performance counters of interest, such as those listed above, for monitoring.

Monitoring Top Consumers

While monitoring the top consumers is not critical to ensure the overall health of the Exchange system, it does allow an administrator to make refinements based on observed resource requirements that can help improve overall Exchange performance. For example, an administrator may be able to better allocate storage resources based on demonstrated need, shift certain high volume mailboxes or public folders to servers with less traffic or a higher capacity.

BMC Performance Manager application classes:

MSEXCH_Top_Senders, MSEXCH_Top_Receivers, MSEXCH_Top_Folders, MSEXCH_Top_Mailboxes

Thresholds:

Thresholding does not typically apply here as the intention is simply to monitor a specified number of top consumers (message tracking must be enabled to use this functionality). With the Exchange product, there is a set limit to allowing monitoring of only 50 consumers in any one of these application classes. While this value may seem low, you must consider the impact that such monitoring could have on your resources and network. BMC Software has artificially set this limit with the intention of preventing any adverse impact that monitoring of an extensive number of users could have.

However, if you are concerned about limiting public folder sizes, you can set a threshold based on the needs of your organization using the MsgSize parameter.

Tips:

The Exchange product has a feature that enables you to monitor specified users in the MSEXCH_Watched_Users application class. This could be advantageous if you need to monitor a particular user to ensure that he or she experiences optimal performance and availability of Exchange (such as the CEO) or for some other reason, such as ensuring that users stay within their quota limits.

You can also configure BMC Performance Manager to watch for suspect mail. BMC Performance Manager for Microsoft Exchanger Servers uses a dummy mailbox to monitor for suspect mail that could contain viruses. This dummy mailbox uses a bogus name and is not a member of any distribution list, so it should never send or receive e-mail. If it does, an e-mail virus may be present.

BMC Performance Manager Benefits:

You can configure the Exchange product to perform recovery actions, such as automatically notifying the top senders, receivers, or mailbox users of their usage. You can also specify whether to use the number of messages or the total message size to determine the top senders, receivers, mailboxes, and folders.

If you choose to use the Watched User function to monitor suspect mail, the Exchange product knows that the dummy mailbox should never send or receive e-mail. In the event that it does, a built-in recovery action automatically shuts down the Message Transfer Agent to prevent further spread of the virus.

Monitoring End-User Perspective Availability

When developing a best practices monitoring plan for Exchange, you should not only monitor client load, but also client perspective. Users depend on Exchange to be available and responsive. Therefore, understanding how the Exchange server responds to typical client usage helps determine its availability to your clients. For details on how to use this feature, see the Best Practices for Monitoring Roundtrip Response Times with BMC Performance Manager paper.

Monitoring Data Access

DSAcess is one of the core components of Exchange 2000 and 2003 that controls how Exchange accesses the Active Directory. For example, a user or Exchange Server can initiate an Active Directory query to look up an e-mail address. These results are cached in a cache called DSAcsess. Because Exchange always searches this cache before submitting a query, this helps reduce network traffic and process performance by eliminating redundant queries, and allows Exchange to be more scalable.

In addition, because the MTA, IS, Exchange routing, and other components require DSAccess for processing, monitoring the availability of data access is key to the availability of the Exchange environment.

BMC Performance Manager application class:

MSEXCH_DSAccess_Cache, MSEXCH_DSAccess_Processes

Thresholds:

If any of the parameters in this application class are high or rising and corresponds with a decrease in the message delivery rate, Active Directory is having problems.

Tip:

If your server houses a large number of mailboxes, you may be able to improve performance by increasing the cache size. Microsoft sets the default user and configuration cache size at 25MB. However, depending on your hardware, you may be able to increase the user cache size to 90MB and the configuration size to 5MB.

Parameter Summary

Exchange Area

Application Class/Parameters

Exchange Services

  • Microsoft Exchange Information Store
  • Microsoft Exchange MTA Stacks
  • Microsoft Exchange Routing Engine
  • Microsoft Exchange System Attendant
  • Simple Mail Transport Protocol
  • World Wide Web Publishing Service

NT_SERVICES\SvcDown

NT_SERVICES\ServiceStatus

Exchange Processes

  • Store.exe (Information Store Service)
  • Inetinfo.exe (IIS, Routing Engine)
  • Mad.exe (System Attendant)
  • Emsmta.exe (MTA Stacks Service)

NT_PROCESS\PROCDown

NT_PROCESS\PROCProcessorTime-Percent

Events from Sources

  • IMAP4Svc
  • MSExchangeAL
  • MSExchangeIS\System
  • MSExchangeIS\Mailbox
  • MSExchangeIS\Public Folder
  • MSExchangeSRS
  • MSExchangeTransport
  • MSExchangeMTA
  • MSExchangeSA
  • POP3SVC

NT_EVENTLOG\Application\eventFilter\ELMError

Client Load

  • client load
  • top mail senders and mail receivers
  • top mailbox users and public folders

MSEXCH_Top_Senders\MsgSize

MSEXCH_Top_Senders\MsgCount

MSEXCH_Top_Senders\AvgMsgsPerHour

MSEXCH_Top_Receivers\MsgSize

MSEXCH_Top_Receivers\MsgCount

MSEXCH_Top_ReceiversAvgMsgsPerHour

MSEXCH_Top_Mailboxes\MsgSize

MSEXCH_Top_Mailboxes\MsgCount

MSEXCH_Top_Folders\MsgSize

MSEXCH_Top_Folders\MsgCount

MSEXCH_Watched_Users\AttachmentSize

MSEXCH_Watched_Users\MsgCount

MSEXCH_Watched_Users\MsgSize

MSEXCH_Watched_Users\SuspectMsgCount

End-User Perspective Availability

  • sending round-trip messages to Exchange Server
  • logging on and off Exchange Server
  • opening messages
  • creating messages
  • sending messages
  • deleting messages

MSEXCH_Roundtrip_Client\CreateMsgTime

MSEXCH_Roundtrip_Client\DeleteMsgTime

MSEXCH_Roundtrip_Client\LastMsgTime

MSEXCH_Roundtrip_Client\LastNMsgTime

MSEXCH_Roundtrip_Client\LogoffTime

MSEXCH_Roundtrip_Client\LogonTime

MSEXCH_Roundtrip_Client\MaxMsgTime

MSEXCH_Roundtrip_Client\OpenMsgTime

MSEXCH_Roundtrip_Client\SendMsgTime

MSEXCH_Roundtrip_Client\Status

Data Access

  • DS Access
  • Address List

MSEXCH_DSAccess_Cache\AsyncReadsPending

MSEXCH_DSAccess_Cache\AsyncSearchesPending

MSEXCH_DSAccess_Processes\LdapReadTime

MSEXCH_DSAccess_Processes\LdapSearchTime

MSEXCH_Address_List\ListQueueLength

Message Traffic - Protocol Queues

  • Protocol Queues
  • Sent/Received
  • Client Connections

MSEXCH_Queues\IncreasingTime

MSEXCH_Queues\State

MSEXCH_Sent_Mail\RecvMsgs

MSEXCH_Sent_Mail\SentMsgs

MSEXCH_Sent_Mail_Containers\TotMsgSent

MSEXCH_Sent_Mail_Containers\TotMsgReceived

Message Traffic - Routing Engine

MSEXCH_MTA\WorkQueueLength

MSEXCH_MTA_Connections\QueueLength

Message Traffic - Information Store

MSEXCH_DB_Private\RecvQueueSize

MSEXCH_DB_Public\RecvQueueSize

MSEXCH_DB_Private\SendQueueSize

MSEXCH_DB_Public\SendQueueSize

Message Traffic - SMTP

MSEXCH_SMTP_Server\LocalQueueLength

Store

  • EXPIC
  • IS Private and Public

MSEXCH_ExIPC\ClientQueLen

MSEXCH_ExIPC\StoreQueLen

MSEXCH_DB_Private\SendQueueSize

MSEXCH_DB_Public\SendQueueSize

MSEXCH_DB_Private\MsgSentPerMin

MSEXCH_DB_Public\MsgSentPerMin

MSEXCH_DB_Private\RecvQueueSize

MSEXCH_DB_Public\RecvQueueSize

User and Client Connections

MSEXCH_IS\UserCount

MSEXCH_IS\ConnectCount

Perfmon:MSExchangeIS\RPC Requests

Perfmon:MSExchangeIS\RPC Operations/sec

Sources

Books
Online Books
Articles
Websites
Software
Online Help Systems
Helping You Maintain Advantage

BMC Software Education Services offers a strategic investment for your business, maximizing the value for your employees and Business Service Management initiatives. Education ensures successful product implementation, promoting mastery of all product capabilities and highest productivity with your BMC Software solutions. To explore our education offerings, visit our web page at http://www.bmc.com/bmceducation, or contact BMC Software Education Services by telephone or e-mail:

Copyright 2005 BMC Software, Inc., as an unpublished work. All rights reserved.

BMC Software, the BMC Software logos, and all other BMC Software product or service names are registered trademarks or trademarks of BMC Software, Inc.

IBM is a registered trademark of International Business Machines Corporation.

DB2 is a registered trademark of International Business Machines Corporation.

Oracle is a registered trademark, and the Oracle product names are registered trademarks or trademarks of Oracle Corporation.

All other trademarks belong to their respective companies.

August 15, 2005

About BMC Software

BMC Software, Inc. [NYSE:BMC], is a leading provider of enterprise management solutions that empower companies to manage their IT infrastructure from a business perspective. Delivering Business Service Management, BMC Software solutions span enterprise systems, applications, databases, and service management. Founded in 1980, BMC Software has offices worldwide and fiscal 2004 revenues of more than $1.4 billion. For more information about BMC Software, visit www.bmc.com.

56820