Jay Lipovich – BMC Software | Blogs

Capacity Planning Steps to Ride the Mainframe Storm

Jay Lipovich — Fri, 21 Sep 2018 00:00:19 +0000

In my previous blog, I discussed three events creating a perfect storm for mainframe capacity planning:

Digital disruption
Increasingly complex hardware and software choices
Mainframe skills “brain drain”

As promised in that post, here are steps you can take to address each event and ensure accurate and timely capacity planning for mainframes.

Digital Disruption

Tightly align to business applications in planning and reporting. Replace plans and reports that portray service classes or technologies (such as “CICS Regions,” etc.) with those applications that are critical to the business, such as “Online Banking” for example, which will likely be a combination of usage and consumption in several technology stacks (such as CICS, Db2, IMS, MQ).

Recognize the difference between disruptive change and noise. Effective planning begins with a firm understanding of where you are. Determining that requires filtering out noise with machine learning and analytics.

Be prepared to rapidly evaluate impacts of changes and alternatives. Business events happen much more rapidly in today’s environment. Even the pace of change is changing, according to the 2018 Mainframe Survey from BMC. A capacity plan that requires weeks or months to develop cannot support the dynamic environment today. Capacity planning should be able to respond quickly to any nuance of change.

Apply sensitivity analysis to manage uncertainty. The disruption hitting mainframes introduces a level of uncertainty that has not been seen before. One way to account for uncertainty in the plan is to run many different scenarios across the range of uncertainty, to identify “breaking points” in activity that require capacity adjustments.

Complex Hardware and Software Options

Rationalize hardware costs versus performance. While MIPS and CPU utilization have been the default for expressing capacity needs, the requirements of a demanding user population place responsiveness and availability as top measures of success. Capacity plans must include the performance of specific business applications as key indicators of the ability of the plan to deliver on business services.

With a responsiveness focus, use capacity analysis to identify underlying resource constraints, and then evaluate a range of hardware responses. In some cases, the underlying conditions may show that an assumed option (such as a CPU upgrade) will not fix the performance shortfall, where another option (such as a peripheral upgrade) will deliver the required performance.

Right-size software cost options. An increasing menu of software licensing options creates a bewildering range of choices – from Country Multiplex Pricing to various forms of software containers. In many cases, the best way to determine the lowest ongoing cost and optimal capacity options is to analyze and optimize current costs and capacity. Then, evaluate the impact of the assumed future state of a software container based on capacity requirements and on other work.

Address volatility with rapid scenario planning. This is closely related to managing uncertainty, but the emphasis for volatility is on “rapid” evaluation as well as multiple scenarios. It is also possible to be proactive about volatility (even though it sounds like an oxymoron). Apply sensitivity analysis to plan inputs that might become highly volatile, and again, develop the range of potential performance results.

Support a wide range of what-if alternatives. To support a dynamic and volatile environment, capacity planning needs to accommodate a wide range of “what-if” scenarios, ranging from workload increases/decreases/changes, to CPU and peripheral options, to modifying configuration of existing resources, to adding/removing systems, LPARs and workloads.

Brain Drain

Solutions with engineered-in SME intelligence. A skilled capacity planner will take certain actions when analyzing performance and capacity needs. As the skilled capacity planners leave the workforce, a capacity planning solution that has built-in capacity planning knowledge will assist newer technicians in developing answers and avoiding major errors.

Automate, automate, automate. Capacity planning may be the best IT example of the dilemma of the “cobbler’s children” since IT applies automation to so many aspects of IT Operations Management, but we still have capacity planners who spend substantial time in manual management. Automating standard tasks frees technicians to work on higher value activities and mitigates risks from manual activities. Even more importantly, automating actions such as creating models every day ensures that capacity planning can respond quickly to the disruptive events that may create a need for adjustments in the plan.

Eliminate programming. In many organizations, capacity planning uses home-grown tools that were created in SAS, SQL or Excel macros. Such programming consumes substantial time to create and maintain, introduces potential for errors, and cannot be responsive enough for volatility in modern environments. In addition, any of the programming that runs on the mainframe is adding more load to the resource that is trying to be optimized.

Gain insights with machine learning analytics. Volatility makes it difficult to identify a basis for planning. Volatility is also changing the characteristics of what “normal” processing and usage look like. Machine learning and analytics can keep up with the volatility to provide a clearer understanding of “normal” and how much variability there is.

What next? Evaluate your mainframe capacity planning approach and see how many of these steps you have in place. Identify shortfalls and create a plan to implement improvements. Remember the words of IT guru Peter Drucker: “Long range planning is not concerned with future decisions, but with the future of present ones.” Prepare your capacity planning by taking these storm-proofing steps and you can have confidence in the future of the present decisions you will make.

Five Myths of Mainframe Capping

Jay Lipovich — Fri, 04 Nov 2016 05:29:27 +0000

The mainframe operating system provides a mechanism that IT can use to limit the amount of resources that workloads use, and limit exposure to excessive IBM Monthly License Charge (MLC) costs. Defined Capacity, or capping, can be set at specific levels by IT, but the consequence is that when work reaches the cap, the operating system will not allow any additional MSUs to be used, so workloads will be delayed. The potential for creating service level impacts to critical business work has made many mainframe users reluctant to use capping. As a result, there are some misconceptions about the risks and rewards of using capping. This blog explores five of them.

If I cap, I put business services at risk.

An effective capping strategy does not have to put business services at risk. Caps that are setup correctly account for the different importance levels of workloads and ensure that high importance work gets the resources it needs, and may restrict low importance work where delays are acceptable. Workload volumes vary widely over time, so getting “the one right cap” that is safe for all time periods is just not attainable using native tools.

With the digitally-driven volatile and variable work impacting mainframe systems, an effective capping solution needs to dynamically and automatically adjust capacity limits across LPARs to ensure there is no risk to business services. The solution needs to evaluate workload priority, available capacity and the relative cost of MLC products running on the various LPARs. This should be accomplished under control of a policy you set. One that defines workload priorities, target MSU consumption and the cost of MLC products on LPARs. With this approach, you can mitigate risk to critical work, and achieve lower MSU consumption and lower MLC costs.

For a more detailed discussion of how to cap without risk, view this webinar .
Capping may avoid excess charges, but it cannot reduce my ongoing MLC costs.

This is true of standard manual capping approaches. The volatility and variability of workloads on the system dictate that caps be set to avoid excess charges, but do not constrain any priority work running on the systems. As a result, caps are usually set high to prevent excessive charges and avoid workload impacts. However this eliminates opportunities to reduce costs. Standard capping mechanisms cannot address variability/volatility, differentiate workload priorities, or recognize excess capacity that may exist across the environment.

However, an automatic, dynamic capping adjustment approach protects against excessive usage and ensures that priority work is not resource constrained, while lowering total MSU consumption, and MLC costs. This approach dynamically adjusts caps to align priority workload requirements and available unused capacity on other LPARs. It moves cap space between them as the variability of the workloads dictates. In shifting excess capacity across LPARs, this capping approach also has to be aware of the relative cost of the MLC products which are running on each LPAR in order to avoid inadvertently increasing overall MLC costs.
Capping doesn’t work with variable and volatile workloads.

As discussed in Myth #2, a dynamic capping mechanism is the key to actually reducing MLC costs. The automatic, dynamic, workload-aware and MLC cost-aware approach is ideal for handling variable and volatile workloads, which are being driven by digital engagement. A capping approach that examines workload activity and priorities in real time, can make cap adjustments that accommodate workload changes, and balances required service levels, capacity and costs. This approach ensures service quality and cost optimization, even when workloads are highly variable.
Effective capping takes a lot of knowledge, time and continuous effort.

Capping is a complex activity involving the interaction of workload activity, workload priorities, available capacity, LPAR configurations, MLC licenses and MLC costs, to name a few factors. It can be daunting to try to develop a manual capping strategy and keep it up to date with constantly changing workloads. Using an appropriate automated capping solution will alleviate most of the knowledge, time and continuous effort requirements.

An automated, dynamic capping engine can adapt to changes in workloads and capacity for you. It continuously makes adjustments to ensure service levels are met, and can reduce total MSU consumption, thus reducing MLC costs. It can also provide observational information you can use to make decisions about prioritization and MSU target levels. In addition, if it simulates the capping actions (without actually taking them), you have a combined view of the various complex factors and the capping actions the solutions may take for you. This further reduces the time and effort required for implementing a capping strategy for risk mitigation and cost reduction.
Automation is scary and I can’t trust it to manage my critical workloads

Delivering availability and performance for critical work continues to be a high priority for mainframe shops, as was reported in the recently released 2016 Annual Mainframe Research from BMC, which can he viewed here . On the other hand, mainframe IT has been at the forefront of using automation to make mainframes more available, higher performing and more cost effective than other platforms. Mainframe IT uses automation to manage responses to problems, to manage critical data bases and to manage the execution of thousands of jobs. So using automation to control caps should not be any different.

The concern for putting critical business services at risk is valid, and there have been instances where setting an incorrect cap, or not adjusting cap setting as the workload changed, have created service level failures. So caution may be warranted.

The keys to being comfortable with automated capping are:

Make sure the capping approach recognizes workload importance in its capping decisions.
Operate under a user-defined policy that aligns automated capping decisions with IT priorities and costs concerns.
Require a solution that has a simulation mode that displays capping actions it would have taken, (but did not), so you can become comfortable with how it will manage your workloads and environment.

By using a deliberate approach to implementing an automated capping capability, you can verify the actions will align with your goals and be assured that capping will benefit your workloads and your costs. More information on an approach to this can be found here.