Introduction / background
“If you cannot measure it, you cannot improve it.” – Peter Drucker. This all too often misquoted phrase can be applied in a variety of ways. However, it is especially accurate when we are talking about infrastructure management. If we do not measure (monitor) the infrastructure then we certainly cannot improve (manage) it effectively either.
Infrastructure monitoring is one of many components of infrastructure management
BusinessDictionary.com defines management as “the organization and coordination of the activities of a business in order to achieve defined objectives.” It also states that managers can be described as those “who have the power and responsibility to make decisions and oversee an enterprise.”
These basic definitions are especially relevant to infrastructure managers. IT infrastructures involve a number of complicated systems, equipment, and processes. In order for the infrastructure to achieve the objectives of an organization, this complexity must be organized and coordinated effectively.
Infrastructure monitoring gives the infrastructure manager the data required to understand the status of the infrastructure and the capability to quantify progress towards organizational objectives. In its simplest description, infrastructure monitoring can be thought of as the continual collection and review of meaningful data about the infrastructure. This continual collection and review of data is what enables effective infrastructure management. Essentially, infrastructure monitoring enables infrastructure management. The data gathered through infrastructure monitoring provides the infrastructure manager the ability to manage other processes such as capacity management, availability, service level management, security, etc.
Infrastructure monitoring is a vital mechanism for communication
Technology is inherently complicated. Just as with many other areas of the business, the infrastructure comes with its own set of terms, slang, and unique employee personalities. Complexity (in any field) is oftentimes difficult to communicate with senior management not familiar with the details. By gathering data about the infrastructure, the infrastructure manager can use that data to communicate in business terms that the rest of the organization readily understands.
To better understand how this works, let us look at the most common (and sometimes easiest) infrastructure metric to monitor – availability. Availability is essentially the percentage of time that your infrastructure services are functioning as designed or “available” to the consumers of those services. For this example, we are also going to assume that services are available 24x7x365 and that the standard measurement of hours per month is 730. During the first month of monitoring, the network is measured as having 99.5% availability (4 hours of downtime).
With this data in hand, an infrastructure manager can have conversations at all levels of the organization without getting into the complexities of the infrastructure itself. A few examples of conversations that this enables include:
- How did the 4 hours of downtime affect the organization?
- What is the increased cost of reducing our network availability?
- Can our processes be optimized to reduce the amount of downtime we have?
Getting started with infrastructure monitoring and management
Infrastructure monitoring is most effective when done through automated monitoring tools. Many tools exist for monitoring the infrastructure. They range in complexity from simple availability monitoring to complex tools that can kick off automated processes when thresholds have been reached. Each organization’s monitoring requirements differ. However, most infrastructure managers will agree that it is best to start small and simple.
Our earlier example of measuring availability is a good first step in monitoring the infrastructure. There are still organizations out there who use network availability as the focus of their monitoring systems just because it is well understood and easily quantified. However, organizations should move well beyond simple availability statistics as they mature in their management and monitoring practices.
Perhaps the most common management challenge is to ensure that the infrastructure is managed at an appropriate level for the organization. Service Design and Service Level Agreements (SLA’s) are outside the scope of this post, but they are vital in establishing an understanding between the infrastructure manager and the organization. They allow the infrastructure manager and the organization to establish agreement as to whether 4 hours of downtime is acceptable or not. While every infrastructure manager would like to have 100% availability, increased reliability often means increased cost. If the additional availability is not required by the organization, then 100% availability can have a negative effect on the profitability of the organization.
Today’s organizations cannot operate without a finely tuned and efficient infrastructure. Virtually every business process (both internal and external) relies on the infrastructure. Even with organizations that have migrated to the cloud, there is still an infrastructure that must be effectively monitored and managed. Because of this, the infrastructure manager should give the same level of scrutiny to infrastructure monitoring tools as he/she does to any other component of the infrastructure. Some key things to consider in an infrastructure monitoring solution include:
- Single system for monitoring both on premises and cloud-based infrastructure
- Vendor independent architecture. The manager should not be forced to select a monitoring system based on the equipment manufacturer and vice versa.
- Intelligent prioritization of notifications based on user-defined parameters.
- Ability to establish thresholds that trigger automated events. One example could be “when virtual server utilization reaches 80%, notify manager and provision additional virtual machines”
- Show real-time information on both individual components and the aggregate systems made up by those components.
This is obviously not an exhaustive list, but merely some suggestions for getting started.
- Multi-Cloud Best Practices: How IT Ops Can Champion
- What is Serverless Computing? Serverless Computing Explained
- IT Orchestration vs Automation: What’s the Difference?
- IT Infrastructure Capacity: Optimizing for Digital Maturity
- What is Stream Processing? Event Stream Processing Explained