My previous installment in this series ranted about what is batch and how it still remains an essential discipline in computing. This time, I want to focus on the value of a batch approach for “modern” technologies like cloud, containers, microservices and serverless computing.
It may be ironic that the management approaches and best practices that have evolved over the years in managing enterprise batch computing may be exactly what’s needed in today’s modern, elastic environments.
Managing Business Applications
As soon as you have more than a single business function or even a single function consisting of multiple components, you have a management requirement. You need to be able to:
- Determine whether things that are running should be running
- See whether there are any things that should be running but are not
- Have visibility into whether what’s running is running on time or correctly (as opposed to just looping or waiting)
- See confirmation that notification of a failure occurred and tools and facilities to analyze and correct the problem are mobilized
The challenge of meeting the above basic requirements increases with the number of components and technologies in use to implement a business function.
Modern applications may contain components that use traditional relational databases or newer ones like noSQL, traditional file systems or Hadoop Distributed File Systems, ERPs, SaaS solutions, etc., etc.
Architectures may consist of traditional hierarchical flows, be message-based with a “Publish/Subscribe” relationship among application components or take a microservices approach.
For all these varieties of application structure, operational instrumentation remains a requirement that development, DevOps and IT Ops teams must address in order to meet the service delivery demands of the business.
The Value of Batch Management in a Modern Environment
In these evolved environments, applications still defer processing until either some data is aggregated or some other collection of events occurs before subsequent processing is performed. Furthermore, whether waiting for events, which is just the modern equivalent of date and time, processes have to connect to databases and other applications, logs must be captured, visibility of these relationships is required, etc., etc. All these actions are part of what “automation” means and in the absence of any other way to accomplish it, developers frequently fall back to scripting. If one wants to argue the need for such management, just examine EVERY modern environment, whether it’s the leading ERP providers, leading database providers, leading Big Data providers or leading cloud providers. ALL of these environments keep re-inventing a batch management solution. If this was not a fundamental requirement, we would not continue to see SM36/SM37, SQL Agent scheduler, Oozie, Airflow, Azkaban, Luigi, Chronos, Azure Batch and most recently AWS Batch, AWS Steps Functions and AWS Blox to join AWS SWF and AWS Data Pipeline.
The problems with all these tools are:
- Their solutions take a “head in the sand” position that their application or ecosystem is the only one you will ever use so they don’t even acknowledge existence of other environments
- Building a comprehensive, sophisticated solution is hard and takes a long time and so far, precious few (I would argue only one) solutions have been able to meet the diverse requirements I’ve been discussing here
Recently, yet another challenge has emerged that makes all of the above even more challenging: digital transformation. The pressure to accelerate delivery of new business capabilities that run reliably and meet the stringent risk and governance requirements is immense.
Many organizations have turned to DevOps as one of the techniques to enable them to meet the delivery expectations of the business. Although DevOps enables faster delivery, it does little to improve operational tools that are deficient in the level of visibility and manageability they offer in the production environment. Requirements for a DevOps toolset to help address those issues include:
- Support highly heterogeneous collections of platforms and applications,
- Understand business service levels
- Provide a way to visualize connections among components across that heterogeneity
- Support quick access to debugging and problem analysis data or offer business users insights into their workloads
The answer may lie in working backwards. Find a solution that provides the functionality you need to run your production environment, not your development environment. After all, that’s where the rubber meets the road so to speak.
You won’t find many toolsets that can give you the broad functionality you require but there are some; at least one. And then find which of those solutions has been DevOps-enabled. That will be an even smaller subset but again there will be at least one.
This is the third blog in the 4-part series on Modern Batch. Stay tuned for the fourth and final blog, “Modern Batch: The Ops in DevOps fully evolved – Jobs as Code!”