Joe Goldberg – BMC Software | Blogs

Let Machine Learning Pipelines Benefit from Automation

Joe Goldberg — Wed, 15 Feb 2023 08:28:09 +0000

A recent book published by O’Reilly, Building Machine Learning Pipelines—Automating Machine Learning with TensorFlow, provides practical, useful guidance on how to create machine learning (ML) pipelines, which is something many BMC customers that I talk with are heavily engaged in. At 367 pages, with step-by-step guidance on how to set up a Kubernetes cluster on Google Cloud and tips for running Kubeflow pipelines, this book is not a light read. We’ll spare the technical details here, but do want to highlight the principles of effective ML development efforts highlighted by authors Hannes Hapke and Catherine Nelson.

According to the authors, “[As enterprises try] to be more intelligent and make more use of AI, machine learning pipelines need to become more like CI/CD [continuous integration and continuous deployment] pipelines.” Becoming more like modern CI/CD pipelines includes avoiding or at least minimizing custom coding or custom processes for each iteration of the pipeline. Organizations need to avoid doing things differently at different stages every time something new comes through the pipeline. As the authors note, “The connections between tasks are often brittle and cause the pipelines to fail.”
ML pipelines need to be automated, scalable and reproducible. This can’t be accomplished if custom code is used extensively or processes are tweaked each time they run.
Achieving these principles for automated, scalable ML pipeline development requires modern approaches. In their attempt to streamline development, enterprises have allowed, and maybe even encouraged, tooling to proliferate, in the belief that you can never have too much automation. For example, the 2021 State of ModelOps Report found that of 81 percent of financial services companies were using multiple ML ModelOps solutions, and 42 percent had more than five.

The more tools that are used, the greater the challenge to integrate them. Scaling and automation become very difficult if tools can’t work together, which is a reason why so many ML models never make it all the way to production. To address this problem, organizations are turning to orchestrators like Apache Airflow, Apache Beam, TensorFlow, and KubeFlow Pipelines. The book describes how these orchestrators can be highly effective for automating tasks in their target environment. However, their emergence has created a modern data management problem: How to integrate the multiple, domain-specific data and ML orchestrators that are now being used across the enterprise for its different development and data infrastructure environments?

Not having integration into a common environment greatly limits the ability to automate, troubleshoot, and scale pipelines. The integrated environment needs to provide visibility into dependencies, including those involving traditional business applications that may be generating input to or consuming output from these emerging ML applications.

It can be done. In the Control-M world, machine development is already like modern CI/CD because Control-M natively supports multiple cloud and ML environments and tools, including Apache Airflow.

ML orchestration needs to be comprehensive because it touches on the three core principles of becoming more like CI/CD pipelines—applying CI/CD principles, minimizing custom code, and pursuing automation and scalability. Let’s look at these principles more closely.

Applying CI/CD Principles to ML

There are many domain-specific tools for almost every conceivable segment of technology and there is ample justification for almost all of them. This book makes clear that no one would consider using CI/CD tools for ML pipelines, even though at some level there appears to be a great deal of similarity. Multiple tools are necessary, so organizations need a way to see across all these domains and see the relationships among them.

Regardless of the tools used, or how many are used, organizations need a way to bring their workflows into a common environment where they can work together and be managed. That need reflects one of O’Reilly’s fundamental principles for effective ML pipeline development. Luckily, there is an example that organizations can look to for guidance, as Hapke and Nelson note, “With machine learning pipelines becoming more streamlined in the coming months, we will see machine learning pipelines moving toward more complete CI/CD workflows. As data scientists and machine learning engineers, we can learn from software engineering workflows.”

One thing many organizations have learned about software workflows is that Control-M enables automation and orchestration starting in development and continuing through each stage of the lifecycle. The solution is a well-proven, agnostic environment to support many development environments and business application workflows, and it extends those capabilities into the ML world. Control-M has native integration for Apache Airflow plus more than 90 other integrations. Now you can use Control-M to manage specialized ML code and models just like any other job. And you can do it in the way that works best for you, i.e., jobs-as-code, using JSON or Python, doing it all in Control-M, or using a combination through integration with your tool of choice.

Minimizing Custom Code

Custom code usually does a good job with tasks like executing a data import from one system to another. The trouble often comes when the environment needs to scale. Developing and maintaining multiple custom integrations becomes too time-consuming as ML programs grow. Automation is essential for scalability, and custom, non-standard code is an obstacle to automation. As the authors state: “Automated pipelines allow data scientists to develop new models, the fun part of their job. Ultimately, this will lead to higher job satisfaction and retention in a competitive job market.”

Creating ML Models that are Automated, Scalable, and Reproducible

Through automation, organizations can remove the need for one-off, self-developed integrations that consume a lot of developer time, create a dependency on individual developers, and limit scalability. The beauty of Control-M is that it allows data science, development, and operations professionals to work with their familiar tools, then brings everything together through single interface. To learn more, see our white paper.

Control-M not only automates workflow execution; it can also automate data ingestion, including file transfers. Embedding jobs-as-code makes workflows self-executing, which saves time in promoting to production. Control-M also has specific features that simplify working with big data environments. Users in multi-tool environments appreciate its core functionality that provides a clear, end-to-end view across very complex and diverse technology and applications landscapes.

Users can also act on this visibility, with Control-M’s native functionality for pausing, killing, and restarting processes as needed, providing business-oriented and technical views of workflows, guidance for site standards, granular security, reporting, and workflow analytics for continuous improvement. These and other author actions can be automated based on events, conditions, and other triggers that you define.

To recap, the people that literally wrote the book on how to develop ML pipelines and get them into production more quickly and effectively cite the need to use specialized tools and automation while minimizing custom code. Historically, the complication has been that integrating specialty tools and automating their interactions has required custom code. Control-M solves that for ML and other environments, orchestrating across tools, clouds, and other components so pipeline development and execution can be automated. We’ll publish a follow-up blog with more specifics on how Control-M enhances Apache Airflow. Until then, you can get a complimentary copy of the book here.

Accelerate Business Transformation with Mainframe Modernization

Joe Goldberg — Thu, 15 Dec 2022 16:01:19 +0000

Introducing a new Control-M and AWS integration

Business modernization is a strategic priority for almost every company that wasn’t born digital native. As executives look to transform their IT environments to deliver better customer and employee experiences, cloud adoption is soaring. The benefits are well documented and clear. However, many organizations have heavily invested in large mainframe environments with critical systems of record. According to Precisely, mainframes handle 68 percent of the world’s production IT workloads. Migrating application and data workflows across complex hybrid-cloud environments isn’t simple.

Because mainframes are deeply embedded into organizations’ tech stacks, changes require extensive examination of many objects. This must be carried out very carefully, generally over long periods of time. That puts companies in the difficult position of having to run old and new workflows simultaneously. Many organizations that use mainframes also operate in heavily regulated industries and are subject to strict governance. That means many, if not all, tasks must be checked, tested, and re-checked to ensure compliance and mitigate risk. And finally, mainframe environments have often been in production for many years. Lots of institutional processes and knowledge have built up over time. Mainframe modernization requires both procedural and cultural shifts. However, companies are not alone on this journey.

BMC works with organizations around the world to address these challenges and successfully deliver on their modernization projects. As an enabler of the Autonomous Digital Enterprise (ADE) framework, our solutions make it easier for companies to manage their constantly evolving tech stacks in an agile fashion to better support stakeholders. Control-M, BMC’s market-leading application and data workflow orchestration platform, gives developers, data engineers, and business users freedom to innovate securely across mainframe and cloud environments within a secure orchestration framework.

To help organizations better address their mainframe modernization challenges, we’ve partnered with Amazon to launch a new Control-M integration with the AWS Mainframe Modernization Service.

Workflow orchestration meets mainframe modernization

Companies can leverage Control-M’s deep operational capabilities with the AWS Mainframe Modernization Service to preserve the continuity of mission-critical business outcomes delivered by automating application and data workflows in production, across distributed, hybrid, and mainframe environments. Control-M easily integrates, automates, and orchestrates new applications and technologies with interfaces for IT operations, data engineers, developers, and business users across their on-premises and AWS platform.

Gur Steif, president of digital business automation at BMC, summed it up well: “Our Control-M customers are focused on driving modernization initiatives and delivering transformative digital experiences for their external and internal customers. We are proud to collaborate with AWS on this very important leg of that modernization journey. With this new integration, our customers can modernize their mainframes and utilize the power of AWS to give stakeholders freedom to collaborate, to turn data into actionable insights faster, and to leverage the latest AWS services using the Control-M platform.”

The AWS Mainframe Modernization Service helps modernize mainframe applications to AWS cloud-native managed runtime environments. It provides tools and resources to help plan and implement migration and modernization. With the integration, job creation and monitoring can be performed entirely via any Control-M interface, which enables organizations to manage AWS Mainframe Modernization Service jobs just like any other Control-M workload.

In addition, users can submit or cancel batch jobs and review the details of batch job runs. Each time a user submits a batch job, the AWS Mainframe Modernization Service creates a separate batch job run, which can easily be monitored. Using AWS Mainframe Modernization Service web console, users can search for batch jobs by name, provide job control language (JCL), script files, and parameters to batch jobs.

The integration also helps companies:

Reduce talent gaps
Support rapid innovation with an agile DevOps approach
Provide easier access to applications and data without significant changes
Optimize the costs of running or extending applications
Maximize business agility

Partnering for success

BMC’s Global Outsourcer System Integrator organization and global partner network will collaborate with companies to help them build a solid mainframe modernization strategy, and to determine the right type of migration for each organization. They’ll also help companies best leverage Control-M throughout the journey.

I recently spoke to Raul Ah Chu, BMC’s Global VP of Sales for the Global Outsourcer System Integrator organization. He is very excited to bring the power of our GSI partner community and their business modernization expertise to our customers. BMC and AWS are working closely with our GSI partners to ensure that companies wishing to take advantage of the AWS Mainframe Modernization Service can do so while maintaining all the benefits of their Control-M platform.

In addition to our AWS partnership, companies will be able to leverage AWS’ Prescriptive Guide, which includes time-tested strategies, best practices, and guidance to help accelerate cloud migration, modernization, and optimization projects. In it, experts from AWS and its partners share practical real-world experience. The guide will help companies navigate the complex cloud landscape with specific approaches through how-to guides, step-by-step tasks, architecture, and code.

The path forward

With Control-M and the AWS Mainframe Modernization Service, companies have the right tools and partners to navigate the migration path at the speed their business requires (whether replatforming, refactoring, or both are required). Control-M will continue to manage all critical business workflows with all the deep operational capabilities organizations depend on. And once they start their cloud migration journey, Control-M will enable them to seamlessly manage all business workflows on a single pane of glass, mainframe to cloud. This gives them a single strategic platform to give all stakeholders the freedom to drive business outcomes faster within a secure orchestration framework.

Additional Control-M integrations with AWS

This adds to an already robust list of Control-M integrations available for the AWS ecosystem including AWS Lambda, AWS Step Functions, AWS Batch, S3 Buckets, AWS Glue, and AWS Databricks. Companies that are re-platforming mainframe applications using the AWS Mainframe Modernization Service can seamlessly orchestrate application and data workflows running in the re-platformed environment while managing dependencies with applications that may still be on-premises.

For more information about this integration please visit the AWS Prescriptive Guidance: Using Control-M workflow orchestrator integration with AWS Mainframe Modernization.

Orchestrate and Automate to Make DataOps Successful

Joe Goldberg — Mon, 03 Oct 2022 08:29:14 +0000

DataOps is intended to smooth the path to becoming a data-driven enterprise, but some roadblocks remain. This year, according to a new IDC InfoBrief sponsored by BMC, DataOps professionals reported that on average, only 58 percent of the data they need to support analytics and decision making is available. How much better would decision-making be, and how much business value would be created, if the other 42 percent of the data could be factored into decisions as intended? It seems logical to assume it would be almost twice as good!

That raises another question: Why can’t organizations get the data that they already have where they need it, when they need it? In most cases, the answer comes down to complexity.

A previous blog by my colleague, Basil Faruqui, introduced why DataOps is important. This one follows up to highlight what is needed. Spoiler alert: The ability to orchestrate multiple data inputs and outputs is a key requirement.

The need to manage data isn’t new, but the challenges of managing data today to meet business needs is changing very fast. Organizations now rely on more data sources than ever before, along with the technology infrastructure to acquire, process, analyze, communicate, and store the data. The complexity of creating, managing, and quality-assuring a single workload increases exponentially as more data sources, data consumers (both applications and people), and destinations (cloud, on-premises, mobile devices, and other endpoints, etc.) are included

DataOps is helping manage these pathways, but is also proving to have some limitations. The IDC InfoBrief found integration complexity is the leading obstacle to operationalizing and scaling DataOps and data pipeline orchestration. Other obstacles include a lack of internal skills and time to solve data orchestration challenges, and difficulty using the available tooling. That means that for complex workloads like those shown above, organizations can’t fully automate the planning, scheduling, execution, and monitoring because the complexity causes gaps, which in turn cause delays. This results in decisions being made based on incomplete or stale data, thus limiting business value and hampering efforts to becoming a data-driven enterprise.

Complexity is a big problem. It is also a solvable one. Orchestration, and more specifically, automating orchestration, are essential to reducing complexity and enabling scalability, unlike scripting and other workarounds. Visibility into processes, self-healing capabilities, and user-friendly tools also make complexity manageable. As IDC notes in its InfoBrief, “Using a consistent orchestration platform across applications, analytics, and data pipelines speeds end-to-end business process execution and improves time to completion.”

Some of the most important functionality that is needed to achieve orchestration includes:

Built in connectors and/or integration support for a wide range of data sources and environments
Support for an as-code approach so automation can be embedded into the deployment pipelines
Complete workflow visibility across a highly diverse technology stack
Native ability to identify problems and remediate them when things go wrong

Tooling that is specific to a software product, development environment, or hyperscale platform may provide some of that functionality, but typically isn’t comprehensive enough to cover all the systems and sources the workflow will touch. That’s one reason so many DataOps professionals report that tooling complexity hinders their efforts.

Control-M can simplify DataOps because it works across and automates all elements of the data pipeline, including extract, transform, load (ETL), file transfer, and downstream workflows. Control-M is also a great asset for DataOps orchestration because:

It eliminates the need to use multiple file transfer systems and schedulers.
It automatically manages dependencies across sources and systems and provides automatic quality checks and notifications, which prevents delays from turning into major logjams and job failures further downstream.

Here are a couple quotes from Control-M users that illustrate its value. A professional at a healthcare company said, “Control-M has also helped to make it easier to create, integrate, and automate data pipelines across on-premises and cloud technologies. It’s due to the ability to orchestrate between workflows that are running in the cloud and workflows that are running on-prem. It gives us the ability to have end-to-end workflows, no matter where they’re running.”

Another user, Railinc, said, “The order in which we bring in data and integrate it is key. If we had to orchestrate the interdependencies without a tool like Control-M, we would have to do a lot of custom work, a lot of managing. Control-M makes sure that the applications have all the data they need.” You can see the full case study here.

These customers are among the many organizations that have reduced the complexity of their DataOps through automation. The IDC InfoBrief compares enterprises that excel at DataOps orchestration to those that don’t and found advantages for the leaders in multiple areas, including compliance, faster decision-making and time-to-innovation, cost savings, and more.

Can you orchestrate similar results at your organization? Learn more about Control-M for Data Pipeline Orchestration here and register for a free trial.

How to orchestrate a data pipeline on Google Cloud with Control-M from BMC

Joe Goldberg — Thu, 22 Sep 2022 16:17:22 +0000

The Google Cloud Platform is designed specifically to accommodate organizations in a variety of positions along their cloud services journey, from large-scale, machine learning (ML) and data analysis to services tailored to SMB to hybrid-cloud solutions for customers that want to use services from more than one cloud provider. When BMC was migrating our Control-M application to this cloud ecosystem, we had to be very thoughtful about how we managed this change. The SADA engineering team worked alongside the BMC team to ensure that we had a seamless integration for our customers.

SADA supported this project by providing an inventory of the Google Cloud configuration options, decisions, and recommendations to enable the data platform foundation deployment, collaborated with BMC on the implementation planning, provided automation templates, and designed the Google Cloud architecture for the relevant managed services on the Google Cloud Platform.

In this article, we will discuss the end-result of this work, and look at an example using a credit-card fraud detection process to show how you can use Control-M to orchestrate a data pipeline seamlessly in Google Cloud.

Five orchestration challenges

There are five primary challenges to consider when streamlining the orchestration of an ML data pipeline:

Understand the workflow. Examine all dependencies and any decision trees. For example, if data ingestion is successful, then proceed down this path; if it is not successful, proceed down that path.
Understand the teams. If multiple teams are involved in the workflow, each needs to have a way to define their workflow using a standard interface, and to be able to merge their workflows to make up the pipeline.
Follow standards. Teams should use repeatable standards and conventions when building workflows. This avoids having multiple jobs with identical names. Each step should also have a meaningful description to help clarify its purpose in the event of a failure.
Minimize the number of tools required. Use a single tool for visualization and interaction with the pipeline (and dependencies). Visualization is important during the definition stage since it’s hard to manage something that you can’t see. This is even more important when the pipeline is running.
Include built-in error handling capabilities in the orchestration engine. It’s important to understand how errors can impact downstream jobs in the workflow or the business service level agreement (SLA). On the same note, failure of a job should not halt the pipeline altogether and involve human interaction. Criteria can be used to determine if a failed job can be restarted automatically or whether a human must be contacted to evaluate the failure, if, for instance, there are a certain number of failures involving the same error.

Meeting the challenge

Meeting these orchestration challenges required a solid foundation and also presented opportunities for collaboration. BMC and SADA aligned using the SADA POWER line of services to establish the data platform foundation. Some notable elements in this technical alignment included work by SADA to:

Apply industry expertise to expedite BMC’s development efforts.
Establish a best practices baseline around data pipelines and the tools to orchestrate them.
Conduct collaborative sessions in order to understand BMC’s technical needs and provide solutions that the BMC team could integrate and then expand upon.

SADA’s Data Platform Foundation provided opportunities to leverage Google Cloud services to accomplish the complex analytics required of an application like Control-M. The BMC and SADA teams worked together to establish a strong foundation for a robust and resilient solution through:

Selecting data and storage locations in Google Cloud Storage.
Utilizing the advantages provided by Pub/Sub to streamline the analytics and data integration pipelines.
Having thorough discussions around the extract, transform, and load (ETL) processes to truly understand the end state of the data.
Using BigQuery and writing analytic queries.
Understanding the importance of automation, replicability of processes, and monitoring performance in establishing a system that is scalable and flexible.
Using Data Studio to create a visualization dashboard to provide the necessary business insights.

Real-world example

Digital transactions have been increasing steadily for many years, but that trend is now coupled with a permanent decline in the use of cash as people and businesses practice physical distancing. The adoption of digital payments for businesses and consumers has consequently grown at a much higher rate than previously anticipated, leading to increased fraud and operational risks.

With fraudsters improving their techniques, companies are relying on ML to build resilient and efficient fraud detection systems.

Since fraud constantly evolves, detection systems must be able to identify new types of fraud by detecting anomalies that are seen for the first time. Therefore, detecting fraud is a perpetual task that requires constant diligence and innovation.

Common types of financial fraud that customers work to prevent with this application include:

Stolen/fake credit card fraud: Transactions made using fake cards, or cards belonging to someone else.
ATM fraud: Cash withdrawals using someone else’s card.

Fraud detection is composed of both real-time and batch processes. The real-time process is responsible for denying a transaction and possibly placing a hold on an account or credit card, thus preventing the fraud from occurring. It must respond quickly, sometimes at the cost of reduced accuracy.

To minimize false positives, which may upset or inconvenience customers, a batch phase is used to continuously fine-tune the detection model. After transactions are confirmed as valid or fraudulent, all recent events are input to the batch process on a regular cadence. This batch process then updates the training and scoring of the real-time model to keep real-time detection operating at peak accuracy. This batch process is the focus of this article.

Use our demo system

SADA and BMC created a demonstration version of our solution so you can experiment with it on Google Cloud. You can find all of our code, plus sample data, in GitHub.

Resources included are:

Kaggle datasets of transaction data, fraud status, and demographics
Queries
Schema
User-defined functions (UDFs)

How it works

For each region in which the organization operates, transaction data is collected daily. Details collected include (but are not limited to):

Transaction details. Describes each transaction, including the amount, item code, location, method of payment, and so on.
Personal details. Describes the name, address, age, and other details about the purchaser.

This information is pulled from corporate data based on credit card information and real-time fraud detection that identifies which transactions were flagged as fraudulent.

New data arrives either as batch feeds or is dropped into Cloud Storage by Pub/Sub. This new data is then loaded into BigQuery by Dataflow jobs. Normalization and some data enrichment is performed by UDFs during the load process.

Once all the data preparation is complete, analytics are run against the combined new and historical data to test and rank fraud detection performance. The results are displayed in Data Studio dashboards.

Figure 1: Control-M orchestration

Google Cloud services in the pipeline

Cloud storage provides a common landing zone for all incoming data and a consistent input for downstream processing. Dataflow is Google Cloud’s primary data integration tool.

SADA and BMC selected Big Query for data processing. Earlier versions of this application used Hadoop, but while working with the team at SADA, we converted to BigQuery as this is the recommended strategy from Google for sophisticated Data Warehouse or Data Lake applications. This choice also simplified setup by providing out-of-the-box integration with Cloud Dataflow. UDFs provide a simple mechanism for manipulating data during the load process.

Two ways to define pipeline workflows

You can use Control-M to define your workflow in two ways:

Using a graphical editor. This provides the option of dragging and dropping the workflow steps into a workspace and connecting them.
Use RESTful APIs. Define the workflows using a jobs-as-code method, then use JSON to integrate into a continuous integration/continuous delivery (CI/CD) toolchain. This method improves workflow management by flowing jobs through a pipeline of automated building, testing, and release. Google Cloud provides a number of developer tools for CI/CD, including Cloud Build and Cloud Deploy.

Defining jobs in the pipeline

The basic Control-M execution unit is referred to as a job. There are a number of attributes for every job, defined in JSON:

Job type. Options include script, command, file transfer, Dataflow, or BigQuery.
Run location. For instance, which host is running the job?
Identity. For example, is the job being “run as…” or run using a connection profile?
Schedule. Determines when to run the job and identifies relevant scheduling criteria.
Dependencies. This could be things like whether the job must finish by a certain time or output must arrive by a certain time or date.

Jobs are stored in folders and the attributes discussed above, along with any other instructions, are applied to all jobs in that folder.

We can see in the code sample below an example of the JSON code that describes the workflow used in the fraud detection model ranking application. You can find the full JSON code in the Control-M Automation API Community Solutions GitHub repo. While there, you can also find some solutions, the Control-M Automation API guide, and other code samples in the same repository.

{
"Defaults" : {
},
"jog-mc-gcp-fraud-detection": {"Type": "Folder",
"Comment" : "Update fraud history, run, train and score models",
"jog-gcs-download" : {"Type" : "Job:FileTransfer",…},
"jog-dflow-gcs-to-bq-fraud": {"Type": "Job:Google DataFlow",…},
"jog-dflow-gcs-to-bq-transactions": {"Type": “Job:Google DataFlow",…},
"jog-dflow-gcs-to-bq-personal": {"Type": "Job:Google DataFlow",…},
"jog-mc-bq-query": {"Type": "Job:Database:EmbeddedQuery", …},
"jog-mc-fm-service": {"Type": "Job:SLAManagement",…},
},
"flow00": {"Type":"Flow", "Sequence":[
"jog-gcs-download",
"jog-dflow-gcs-to-bq-fraud",
"jog-mc-bq-query",
"jog-mc-fm-service"]},
"flow01": {"Type":"Flow", "Sequence":[
"jog-gcs-download",
"jog-dflow-gcs-to-bq-transactions",
"jog-mc-bq-query", "jog-mc-fm-service"]},
"flow02": {"Type":"Flow", "Sequence":[
"jog-gcs-download",
"jog-dflow-gcs-to-bq-personal",
"jog-mc-bq-query",
"jog-mc-fm-service"]}

}
}

The jobs shown in this workflow correspond directly with the steps illustrated previously in Figure 1.

The workflow contains three fundamental sections:

Defaults. These are the functions that apply to the workflow. This could include details such as who to contact for job failures or standards for job naming or structure.

{  "Defaults" : {"RunAs" : "ctmagent", "OrderMethod": "Manual", "Application" : 
       "multicloud", "SubApplication" : "jog-mc-fraud-modeling", 
      "Job" : {"SemQR": { "Type": "Resource:Semaphore", Quantity": "1"},
      "actionIfError" : {"Type": "If", "CompletionStatus":"NOTOK", "mailTeam": 
          {"Type": "Mail", "Message": "Job %%JOBNAME failed", "Subject": 
                 "Error occurred", "To": deng_support@bmc.com}}}
    },

Job definitions. This is where individual jobs are specified and listed. See below for descriptions of each job in the flow.
Flow statements. These define the relationships of the job, both upstream and downstream.

"flow00": {"Type":"Flow", "Sequence":["jog-gcs-download", 
           "jog-dflow-gcs-to-bq-fraud", "jog-mc-bq-query", 
           "jog-mc-fm-service"]},
"flow01": {"Type":"Flow", "Sequence":["jog-gcs-download", 
           "jog-dflow-gcs-to-bq-transactions", 
           "jog-mc-bq-query", "jog-mc-fm-service"]},
"flow02": {"Type":"Flow", "Sequence":["jog-gcs-download", 
           "jog-dflow-gcs-to-bq-personal", "jog-mc-bq-query", 
           "jog-mc-fm-service"]}

Scheduling pipeline workflows

Control-M uses a server-and-agent model. The server is the central engine that manages workflow scheduling and submission to agents, which are lightweight workers. In the demo described in this article, the Control-M server and agent are both running on Google Compute Engine VM instances.

Workflows are most-commonly launched in response to various events such as data arrival but may also be executed automatically based on a predefined schedule. Schedules are very flexible and can refer to business calendars; specify different days of the week, month, or quarter; define cyclic execution, which runs workflows intermittently or every “n” hours or minutes; and so on.

Processing the data

File Transfer job type

Looking at the first job (Figure 2), called jog-gcs-download, we can see that this job, of the type Job:FileTransfer, transfers files from a conventional file system described by ConnectionProfileSrc to Google Cloud Storage described by ConnectionProfileDest.

The File Transfer job type can watch for data-related events (file watching) as a prerequisite for data transfer, as well as perform pre/post actions such as deletion of the source after a successful transfer, renaming, source and destination comparison, and restart from the point of failure in the event of an interruption. In the example, this job moves several files from a Linux® host and drops them into Google Cloud Storage buckets.

"jog-gcs-download" : {"Type" : "Job:FileTransfer",
        "Host" : "ftpagents",
        "ConnectionProfileSrc" : "smprodMFT",
        "ConnectionProfileDest" : "joggcp",
        "S3BucketName" : "prj1968-bmc-data-platform-foundation",
        "Description" : "First data ingest that triggers downstream applications",
        "FileTransfers" : [
          {
            "TransferType" : "Binary",
            "TransferOption" : "SrcToDestFileWatcher",
            "Src" : "/bmc_personal_details.csv",
            "Dest" : "/bmc_personal_details.csv"
          },
          {
            "TransferType" : "Binary",
            "TransferOption" : "SrcToDestFileWatcher",
            "Src" : "/bmc_fraud_details.csv",
            "Dest" : "/bmc_fraud_details.csv"
          },
          {
            "TransferType" : "Binary",
            "TransferOption" : "SrcToDestFileWatcher",
            "Src" : "/bmc_transaction_details.csv",
            "Dest" : "/bmc_transaction_details.csv"
          } 
        ]
      },

Dataflow

Dataflow jobs are executed to push the newly arrived data into BigQuery. The jobs appear complex, but Google Cloud provides an easy-to-use process to make the definitions simple.

Go to the Dataflow Jobs page (Figure 2). If you have an existing job, choose to Clone it or Create Job from Template. Once you’ve provided the desired parameters, click on Equivalent REST at the bottom to get this information (Figure 3), which you can cut and paste directly into the job’s Parameters section.

Figure 2: Dataflow Jobs page

Figure 3: Cut and paste into job Parameters section

"jog-dflow-gcs-to-bq-fraud": {"Type": "Job:ApplicationIntegrator:AI Google DataFlow",
        "AI-Location": "us-central1",
        "AI-Parameters (JSON Format)": "{\"jobName\": \"jog-dflow-gcs-to-bq-fraud\",
        \"environment\": {        \"bypassTempDirValidation\": false,
        \"tempLocation\": \"gs://prj1968-bmc-data-platform-foundation/bmc_fraud_details/temp\",
        \"ipConfiguration\": \"WORKER_IP_UNSPECIFIED\",
        \"additionalExperiments\": []    },    
        \"parameters\": {
        \"javascriptTextTransformGcsPath\": \"gs://prj1968-bmc-data-platform-foundation/bmc_fraud_details/bmc_fraud_details_transform.js\", 
        \"JSONPath\": \"gs://prj1968-bmc-data-platform-foundation/bmc_fraud_details/bmc_fraud_details_schema.json\",
        \"javascriptTextTransformFunctionName\": \"transform\",
        \"outputTable\": \"sso-gcp-dba-ctm4-pub-cc10274:bmc_dataplatform_foundation.bmc_fraud_details_V2\",
        \"inputFilePattern\": \"gs://prj1968-bmc-data-platform-foundation/bmc_fraud_details/bmc_fraud_details.csv\", 
        \"bigQueryLoadingTemporaryDirectory\": \"gs://prj1968-bmc-data-platform-foundation/bmc_fraud_details/tmpbq\"    }}",
        "AI-Log Level": "INFO",
        "AI-Template Location (gs://)": "gs://dataflow-templates-us-central1/latest/GCS_Text_to_BigQuery",
        "AI-Project ID": "sso-gcp-dba-ctm4-pub-cc10274",
        "AI-Template Type": "Classic Template",
        "ConnectionProfile": "JOG-DFLOW-MIDENTITY",
        "Host": "gcpagents"
      },

SLA management

This job defines the SLA completion criteria and instructs Control-M to monitor the entire workflow as a single business entity.

"jog-mc-fm-service": {"Type": "Job:SLAManagement",
	 "ServiceName": "Model testing and scoring for fraud detection",
	 "ServicePriority": "3",
	 "JobRunsDeviationsTolerance": "3",
	 "CompleteIn": {
	    "Time": "20:00"
	  }
	},

The ServiceName specifies a business-relevant name that will appear in notifications or service incidents, as well as in displays for non-technical users, to make it clear which business service may be impacted. Important to note, Control-M uses statistics collected from previous executions to automatically compute the expected completion so that any deviation can be detected and reported at the earliest possible moment. This gives monitoring teams the maximum opportunity to course-correct before any impact to business services is detected.

Examining the state of the pipeline

Now that you have an idea of how jobs are defined, let’s take a look at what the pipeline looks like when it’s running.

Control-M provides a user interface for monitoring workflows (Figure 4). In the screenshot below, the first job completed successfully and is green, the next three jobs are executing and depicted in yellow. Jobs that are waiting to run are shown in gray.

Figure 4: Control-M Monitoring Domain

You can access the output and logs of every job from the pane on the right-hand side. This capability is vital during daily operations. To monitor those operations more easily, Control-M provides a single pane to view the output of jobs running on disparate systems without having to connect to each application’s console.

Control-M also allows you to perform several actions on the jobs in the pipeline, such as hold, rerun, and kill. You sometimes need to perform these actions when troubleshooting a failure or skipping a job, for example.

All of the functions discussed here are also available from a REST-based API or a CLI.

Conclusion

In spite of the rich set of ML tools that Google Cloud provides, coordinating and monitoring workflows across an ML pipeline remains a complex task.

Anytime you need to orchestrate a business process that combines file transfers, applications, data sources, or infrastructure, Control-M can simplify your workflow orchestration. It integrates, automates, and orchestrates application workflows whether on-premises, on the Google Cloud, or in a hybrid environment.

Introducing Control-M Python Client and Integrations

Joe Goldberg — Wed, 27 Oct 2021 10:00:22 +0000

Every business wants to be data-driven. With the staggering amount of data available to organizations that want to make informed business decisions, those that don’t properly utilize it will be quickly left behind. With companies around the world harnessing the power of data to drive their business forward, becoming a data-driven business is critical to digital transformation.

As an application and data workflow automation and orchestration platform, Control-M has a long history of supporting customers through their digital transformations. That’s why I’m excited to make two announcements that will continue to help Control-M customers with those initiatives.

Control-M Python Client

Many companies struggle to operationalize their data-centric applications. The data scientists and data engineers responsible for these applications are forced spend too much of their time trying to wrangle data from multiple sources with disconnected tools. When those applications get to production, IT operations (ITOps) teams must manage those same disconnected tools to support the new business service.

This is an inefficient and time-consuming approach. Even if it seems to be working okay now, consider the impact to your time-to-market when you need to really scale application updates in production. Why not empower your data teams with Control-M, which is already part of your toolkit?

The new Control-M Python Client allows data engineers and data scientists to leverage Python programming to seamlessly interact with Control-M. They can easily build, test, and promote data workflows with the data-coding language integrated into Control-M through the Control-M Automation API. By putting Control-M to work for both data and operations teams, you can ensure visibility, improve service level agreements (SLAs), and deliver data-driven outcomes faster—at scale—across hybrid and multi-cloud environments.

To learn more about the Control-M Python client, check out our Control-M Integrations datasheet.

Data Ecosystem and Cloud Services Integrations

With the increasing adoption of cloud services, cloud providers have made a significant investment in expanding their capabilities to ensure that cloud environments can support their users’ growing data processing needs. That said, connecting the data that resides across multiple clouds and on-premises data storage requires more tools than any single cloud provider can support. To properly orchestrate application and data workflows for all that data, organizations must use a platform that seamlessly supports multiple cloud vendors and enables end-to-end observability.

BMC can help. We’re extending our support for cloud services with four new Control-M cloud integrations that simplify workflow orchestration across platform-as-a-service (PaaS) offerings from the three leading cloud platforms—Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform. The latest integrations include:

In addition, we’re releasing other data ecosystem integrations, including:

These integrations are available as open-source downloads from GitHub. AWS Glue and Azure Data Factory integrations are BMC-supported and others are community-supported.

And that’s only the beginning. Considering the constantly evolving and changing cloud and data space, we plan to continue a strong release cadence for additional integrations. Look for many more updates and announcements in the future.

For more information about these new integrations, check out our updated Control-M Cloud Datasheet or visit bmc.com/it-solutions/control-m-integrations.

12 Best Practices for Implementing Application Workflow Orchestration

Joe Goldberg — Mon, 16 Aug 2021 13:50:50 +0000

Companies across many industries have embraced application workflow orchestration as a way to drive digital modernization forward. From streamlining targeted advertising campaigns to automating predictive maintenance programs, application workflow orchestration platforms like Control-M are playing a critical role in helping businesses deliver better customer experiences.

If you’re ready to start your application workflow orchestration journey, here are 12 best practices to follow.

Support an “as-code” approach

Regardless of whether workflows are authored via some graphical interface or written directly in code, version control is mandatory. Of course, to enable modern deployment pipelines, your platform should allow you to store and manage workflows in some text or code-like format.

Think in microservices

Avoid monoliths. This applies to workflows just as it does to applications. Identify functional components or services. Use an “API-like” approach for workflow components to make it easy to connect, re-use, and combine them, like this:

Service (Flow) A:

Do something1, emit “something1 done”

Emit “something2 running,” Do something2, Emit “something2 done”

Emit “Service A” done

Service B:

Wait for “Service A” done

Do BThing, emit BThing done

Service C:

DO NOT run while something2 is running

Wait for BThing done

Etc.

Don’t reinvent the wheel

If you have a common function, create a single workflow “class” that can be instantiated as frequently as required, yet maintained only once. Instead of creating multiple versions of a service, use variables or parameters that can accommodate the variety.

Process lineage

Data lineage is frequently cited as a major requirement in complex flows to support problem analysis. Process lineage is just as important and a mandatory requirement for effective data lineage. Without the ability to track the sequence of processing that brought a flow to a specific point, it is very difficult to analyze problems. The need for process lineage arises quickly when a problem occurs in a pub/sub or “launch-and-forget” approach used in triggering workflows.

Make the work visible

Process relationships should be visible. Have you ever encountered a situation where everything appears perfectly normal, but nothing is running? That’s when visualization is particularly valuable. Having a clear line of sight between a watcher or sensor that is waiting for an event and the downstream process that wasn’t triggered because the event did not occur can be extremely valuable.

Codify SLAs

The best way to define a non-event as an error is by defining an “expectation,” commonly called a service level. At its most basic, an unmet service level agreement (SLA) is identified as an error. For example, we expect a file to arrive between 4 PM and 6 PM. It takes approximately 15 minutes to cleanse and enrich the file and another 30 minutes to process it. So, we can set the SLA to be 6:45 PM. If by then the processing is running late or hasn’t started yet, and the flow hasn’t completed, the error can be recognized at 6:45 PM.

A more sophisticated approach is to use trending data to predict an SLA error as early as possible. We know the cleanse step runs approximately 15 minutes because we collect the actual execution time for the last “n” occurrences. The same is true for the processing step. If the cleanse step hasn’t finished by 6:15, or the processing step hasn’t started by 6:15, we know we’ll be late. We can generate alerts and notifications as soon as we know, so that we have the maximum time to react and possibly rectify the problem.

A final enhancement is providing “slack time” to inform humans how much time remains for course correction. In the above scenario, if the cleanse step doesn’t start on time, at 6 PM, there are 45 minutes available to fix the problem before the SLA is breached.

Categorize

As you turn your workflow “microservices” and connecting tasks into process flows, make sure you tag objects with meaningful values that will help you identify relationships, ownerships, and other attributes that are important to your organization.

Use coding conventions

Imagine creating an API for credit card authorization and calling it “Validate.” While it makes sense to you, it may be too vague. Consider qualifiers that will carry more meaning such as “CreditCardValidation”. It is important to keep this in mind when you are naming workflows. It may be great to call a workflow “MyDataPipeLine” when you are experimenting on your own machine, but that gets pretty confusing even for yourself, never mind the dozens or hundreds of others once you start running in a multi-user environment.

Think of others

You may be in the relatively unique position of being the only person running your workflow. More likely, that won’t be the case. But even if it is, you don’t want to have to re-learn each workflow every time you need to modify or enhance it or analyze a problem. Include comments or descriptions on your workflows, or if it’s really complicated, add some documentation. And remember to revise them together with the workflow.

Keep track

Inquiring minds want to know…everything. Who built the workflow, who ran it, was it killed or paused, who did it and why? Did it run successfully, or did it fail? If so, when and why? How was it fixed? And so on. Basically, when it comes to workflows for important applications, you can never have too much information. Make sure your tool can collect everything you need.

Prepare for the worst

You know tasks will fail. Make sure you collect the data required to fix the problem and keep it around for a while. That way, you not only meet the “Keep track” requirement, but when problems occur, you can compare the new failure to previous failures or successes to help determine the problem.

Harness intelligent self-healing

Finally, look for flexibility in determining what is success and what is failure. It’s correct and proper to expect good code, but we have all seen code that issues catastrophic error messages even though the task completes with an exit code of zero. You should be able to define what is and isn’t an error as well as the automated recovery actions for each specific situation.

Next Steps

Ready to see what application workflow orchestration can do for your business? Try Control-M free today!

Workflow Orchestration vs. Continuous Integration: What’s the Difference?

Joe Goldberg — Tue, 09 Jun 2020 00:00:14 +0000

For most companies, Digital Transformation is no longer just a buzzword but a very real shift in how every aspect of the organization operates. That is certainly true for IT and one of the somewhat ironic characteristics of this shift is a huge emphasis on automation. Ironic, because you would have thought that Information “Technology” was always about automation, which is pretty much a synonym for computing. Well, better late than never, I guess.

One of the results of this renewed focus on automation is increased scrutiny of tooling, a desire to understand the similarities and differences of the vast array of available tools and which tool is best for which function.

I’m choosing proxies for these two categories; Control-M, the product I work for and the leading¹ application workflow orchestration tool and Jenkins, the most popular Continuous Integration (CI) tool.

Control-M according to
Control-M

The home page states “Control-M simplifies application workflow orchestration, making it easy to define, schedule, manage and monitor application workflows, ensuring visibility and reliability, and improving SLAs.” The primary Control-M focus is operating business applications in production.

Jenkins

From jenkins.io the subhead on the site is very instructive and captures the essence of the comparison here; it states “Build great things at any scale”. The emphasis is on building software. The Press page stresses the focus on building software with this “blurb” : “Jenkins is … supported by … people interested in continuous integration, continuous delivery and modern software delivery practices. Built on the Java Virtual Machine (JVM), it provides more than 1,500 plugins that extend Jenkins to automate with practically any technology software delivery teams use. … ”

Hear from Users

When evaluating tools, user experience can be helpful and both web sites contain several testimonials. Here are a few selections.

Jenkins

T-Mobile: “We … support their internal and external customers by adopting robust and intelligent practices that speed up the CI/CD cycle.”

“The real magic happens when our developer teams take ownership of the simplified CI/CD pipelines.”

¹ Enterprise Management Associates (EMA) Radar Report for Workload Automation

Morningstar: “Morningstar practices continuous integration (CI) with the … Jenkins Platform to improve consistency and increase automation—vital steps along the organization’s path to continuous delivery (CD) and a DevOps culture. “

Control-M

Hershey’s: Todd Lightner from the Hershey Company describes in this blog how Hershey’s is using Control-M to help keep inventory stocked at stores. Included below is his description of their use case:

“The data center operations group runs thousands of jobs each day. These jobs manage the digital interactions that are necessary to run our business—not just manufacturing, supply planning, supply chain, warehousing, and distribution but also finance, payroll, costing, human resources, marketing, and sales. We handle many of these functions within our complex SAP® environment. BMC’s Control-M solution automates most of these jobs and processes. … So, when anyone asks me what Control-M does at The Hershey Company, I tell them that it literally runs our business.”

A case study of Raymond James Financial on BMC’s website describes their use:

“Control-M manages jobs across complex interdependencies among hundreds of applications that access the company’s data warehouse and consolidated data store. Nightly processing ensures that senior management and financial advisors have the data they need to help clients with investment decisions.”

And Analysts

Analyst coverage of Jenkins has been sparse, possibly due to its independent Open-Source status. However, some documents include this IDC Innovators report and Control-M has historically been covered under the Workload Automation category by Gartner and EMA. The most recent report on Workload Automation is an EMA Radar report that can be found on the BMC’s website.

So, What IS the Difference?

There is general agreement that the Software Development Life Cycle (SDLC) includes Build, Deploy, Operate and Monitor phases. In this discussion, Jenkins and Continuous Integration tools are aimed at the “Build” phase. This focus is emphasized by the features provided by Jenkins, the design of its user interface and the integrations that have been built by the community.

Once an application reaches the Operate and/or Monitor phases, frequently referred to as “production”, application workflow orchestration provides visualization and management of the processes that execute key business services like customer billing, inventory management, training of recommendation and maintenance models and business analytics.

When choosing between these tool categories, identify what you need to automate, who will create and then operate the automation and what impact there may be to your organization as a result of those choices.

Bring Kubernetes to the Serverless Party

Joe Goldberg — Mon, 30 Mar 2020 00:00:18 +0000

Nirvana for application developers is to be able to focus 100% on building the best functionality that will delight users and drive great outcomes for business. Having to worry about compute resources, network configuration and all the other intricacies of infrastructure is, at best, an annoyance. Serverless is one approach that holds out the promise of freeing developers from that drudgery.

However, when serverless is mentioned, typically public cloud vendors and their “Function as a Service” (FaaS) offerings come to mind. In this article, I want to discuss some other options.

Containerization is another technology that has gotten a lot of attention because of its potential to significantly decouple building business logic from worrying about the infrastructure in which that code runs in. And of course, once you get onto the subject of containers, Kubernetes cannot be very far behind.

Kubernetes is one of the most popular open-source projects ever and was the inaugural project of the Cloud Native Computing Foundation (CNCF). In a 2018 blog celebrating Kubernetes’ graduation, the CNCF wrote: “Compared to the 1.5 million projects on GitHub, Kubernetes is No. 9 for commits and No. 2 for authors/issues, second only to Linux”.

The pace of innovation in the Kubernetes community has been dizzying with over one million contributions from more than 2,000 companies. With all that work, the level of functionality is incredibly rich but accessing that power comes with a “complexity” surcharge. If you have ever built a YAML manifest, you know there are tons of options available for networking and storage, bordering on the very complex; hardly the blissful ignorance hoped for with FaaS.

What you may not know, is that several options bring Function-as-a-Service to Kubernetes including Knative, Kubeless, OpenFaaS, Apache OpenWhisk and others. CNCF has an interesting survey here that identifies Knative as most popular, used by 34% of survey respondents who have installed FaaS on Kubernetes.

Knative consists of two major components; Eventing and Serving. Eventing provides facilities for generating and consuming events. The list of supported event sources that are built in, include Kubernetes object events, messages published to Google PubSub, AWS SQS and Kafka topics and events related to select storage and logging facilities. You can also write your own source.

Knative Serving lets you deploy containers in a serverless fashion with Knative handling auto-scaling, routing and networking, and versioning.

In browsing the Knative documentation, it appears there is a fair amount of work to get to the “less” part but once configured, you are able to launch containers as services in Knative about as easily as the equivalent activity on Google, AWS or Azure.

The choices of “Serving” and “Eventing” as component names may lead you to believe that functions are invoked exclusively as either long-running services or in response to events. While that may be true for much of FaaS workload, there is also a significant requirement for managing workload that runs to completion and that is known as jobs or batch. In fact, the CNCF Working Group on Serverless has launched a subgroup focusing specifically on workflow and their requirements are reflected in this design document.

The importance of application workflow orchestration is further demonstrated by offerings such as AWS Step Functions, Azure LogicApps and Google Cloud Composer. Knative Eventing also supports “CloudSchedulerSource” and “CronJobSource” sources but there seem to be several gaps, not only in Knative which still hasn’t reached its 1.0 version (v0.13.0 at time of writing) but also tools for orchestrating serverless in general.

There are lots of blogs like this one or this, discussing the importance and value of orchestrating serverless workflows. Many of these requirements go beyond what the CNCF and cloud vendors are working on and include:

Sequence of functions
Visibility of relationships (predecessor/successor)
Relationships with non-FaaS components such as traditional infrastructure and human actions
Branching depending on success/failure or other indicators
Parallelism
Graphical tools for building and authoring flows
Error recovery such as retry
Easy access to logs and output
Business SLAs
Audit trail

So if you need to orchestrate a combination of serverless and something else, even objects like Kubernetes JOBs, never mind functions in public clouds or traditional applications in on-premises environments, you will have to do a lot of very heavy lifting on your own to gain any measure of integration. In the interest of full disclosure, I work for a commercial vendor that provides very interesting capabilities in this area. You may wish to check us out.

Sophisticated Automation helps Raymond James Drive Continuous Innovation

Joe Goldberg — Wed, 30 Oct 2019 00:00:01 +0000

In this Run and Reinvent podcast I chat with Chris Haynes, Manager of IT Workload Engineering for Raymond James, about how his company is leveraging Control-M to improve availability through sophisticated automation. Raymond James is a full-service financial investment company with over 8,000 financial advisors across 2,600 locations worldwide. Below is a condensed transcript of our conversation.

Joe Goldberg: Chris, let’s just get right into it – can you tell everyone listening a little bit about Raymond James and about your role there?

Chris Haynes: Absolutely, so Raymond James is a full-service financial investment company. We have nearly 8,000 financial advisors administering over $790 billion in client assets and that’s across 2600 locations globally. I manage the Workload Engineering Team, which is part of our service, delivery and support organization. My team was, is actually a bi-product of the positive influence of Control-M on our group. We are able to basically pull out some resources and reallocate them to more sophisticated things. Because of – we’re able to really streamline and understand all of our workflows and business processes better by, after implementing Control-M. So, that’s been a great part of our journey and we continue to leverage that and push those opportunities to take resources and build towards things like continuous improvement and things of that nature.

Joe: So, I understand that you had some interesting work in that area in the last year? Maybe you could tell us a little bit more about that.

Chris: Right. So, kind of to build off. With my team, we’ve really focused on leveraging all the analytics and data we’re pulling out of Control-M to understand where long poles and problems are in our workflows. A big focus has been continuous improvements. So, with that, we have to, we’re leveraging analytics in lots of different proactive looks, leveraging forecasting using every module out of Control-M to identify gaps in processing, and streamlining workflows to better support our business partners.

Over the last year, a lot of the stuff has really paid dividends and we have really been able to improve on meeting our business, our critical business application SLAs going from 87 percent to 98 percent over the last year. And that’s obviously a big win and a nice star by our name there. So, we’re looking for every opportunity. We’re also trying to give buy-in across the enterprise from everybody in the IT organization to leverage data analytics from Control-M. So, we have pulled data out, we have put that in, created a lot of dashboarding off of those analytics. Some folks have really a real time view of where they’re at and understand trends that are occurring with their business applications. So, that’s really been, from our perspective, my internal evangelizing of Control-M and its capabilities to get folks to buy-in and see where they can make improvements.

Joe: Okay. So, that’s some really interesting stuff about what’s been going on and the history and then some of the things that you’re doing, I think, more recently. But I understand that you’ve been using Control-M, or that Raymond James has been using Control-M, to help run critical parts of the business for quite a long time. And so, perhaps you can expand a little bit about that?

Chris: Right. So, a couple of things just to understand how we’ve grown in that space. We’re probably about seven years in using Control-M at Raymond James. I have been here through the whole growing of it and the implementation, so I’ve had a great view of our partnership there. So, we’ve more than doubled our workload but increasing our performance and again, we’ve taken on more sophisticated applications. What’s been part of the evangelizing part really is we’re able to support applications on platforms, any platform, and multiple environments at a very high rate. So, we really have been able to continue to streamline and support our business.

Really our advisors are looking to make sure their applications are up and available. The greatest ability to them is availability and they want to make sure it’s up. That’s why meeting our SLAs were a critical thing. And part of that as we grow, move forward with all this added sophistication, are the additional modules that Control-M continues to add. A big piece that’s been important over the last year is the Automation API that BMC’s introduced for petroleum. That’s really helped us engage with our software engineering teams as they’re developing new applications for our business partners to really improve their ability to code and test and get that to production in a much more streamlined fashion. And it also gives them really empowerment in their ability to work with a tool, so it’s helped with that buy-in. And we’re continuing to grow that across enterprise.

I’ll tell you when we first started using Control-M six, seven years ago, that with any change it’s, for an enterprise tooler, it’s never everybody is automatically on board. But initially we had about 30 or 40 folks that were using the tool and understanding the value. But now, I have over 500 folks across our IT enterprise using the tool and taking advantage of all the analytics and understanding of the functionalities that it offers. So, that continues to be a part of our future especially as we engage our DevOps team, which in the last couple months that’s been a big thing for us is engaging our DevOps folks to start leveraging Control-M as part of their process. So, we’re in the beginning stages of that, so that’s very exciting.

Joe: So, being in financial services, certainly everyone I think expects that things like audit and compliance or regulatory requirements would be a big deal for you as well. Has Control-M helped you in that area?

Chris: Absolutely. So, a couple of things. I mean, we actually, it’s enabled us actually to move forward with a couple of new applications because of the compliance capabilities. But, from an audit perspective really, it’s cut our time by 96 percent. We’re audited, it’s not a quarterly or monthly, it’s routinely throughout the year. So really the requirements, it used to take us tons of man hours, lots of resource requirements, and we are spitting out reports that would take tons of time to collect. Now we’re doing this in minutes, getting it over. It’s in a format that’s easily viewable and our auditors are much happier because we’re able to give clear, concise reporting in a very much better time frame.

Listen to the full episode from SoundCloud or Apple Podcasts to hear the rest of this interview.

Workflow Orchestration: An Introduction

Joe Goldberg — Tue, 15 Oct 2019 00:00:11 +0000

The Wikipedia definition of “workflow” contains the mysterious warning “This article may be too technical for most readers to understand”. So, I’m avoiding the technique of quoting from that source. It’s mysterious because it seems pretty simple what workflow is:

Some series of activities (work) performed in a reasonable sequence (flow).

What may account for the perceived complexity is the number of wildly different industries and segments the term is used in. When it comes to IT, both workflow and orchestration, individually and as a phrase, are used in disciplines including:

This article will focus on using workflow orchestration to run stateful business applications in production, including:

Why that is now called application workflow orchestration
How it’s unique and differs from other disciplines
What to think about when choosing such a tool

One final note of introduction; there is some discussion regarding the differences between orchestration and automation. In this discussion, the distinction is negligible and so the terms are used interchangeably, mainly to avoid excessive repetition.

What are some types of orchestration?

Here are some common orchestration types:

Cloud orchestration

Cloud orchestration is typically used to:

Provision, start, or decommission servers
Allocate storage
Configure networking
Enable applications to use cloud services like databases

This set of activities is the modern equivalent of infrastructure setup initially performed manually by system administrators and later with provisioning tools like Chef, Puppet, and others.

Service orchestration

Service orchestration takes a broader approach and seeks to provide a complete end-to-end solution for delivering a “service”.

In an ideal world, this set of activities would include everything from designing an application according to the business requirements, all the way through to running it in production. However, frequently, the tasks included in a service are more a function of the capabilities provided by a particular set of tools from a specific supplier rather than being a holistic approach encompassing all the required tasks.

When used by providers of Service Desk and similar tools, the orchestration may be limited to tracking the status of tasks and approvals and less on the actual execution of the tasks themselves.

Release orchestration

This is the definition according to Gartner: “Release Orchestration tools provide a combination of deployment automation, pipeline and environment management, and release orchestration capabilities to simultaneously improve the quality, velocity and governance of application releases.

ARO tools enable enterprises to scale release activities across multiple, diverse and multigenerational teams (e.g., DevOps), technologies, development methodologies (agile, etc.), delivery patterns (e.g., continuous), pipelines, processes and their supporting toolchains”.

Application workflow orchestration

Now let’s focus on our topic by looking at an example of a connected vehicle application. One of its goals is to reduce vehicle downtime by collecting and monitoring the telematics data collected from sensors on the vehicle.

Extensive data is generated about every aspect of vehicle operation. That data is ingested and analyzed by machine learning algorithms to predict potential failure. If a problem is anticipated, the application correlates vehicle location to service depots with parts availability, directing drivers to complete the preventative repair in route versus a roadside repair.

The workflow includes:

Watch for telematics data arriving from a third-party provider
Move the data on a regular basis from its cloud-based landing location to a Hadoop cluster
Enrich the sensor data with vehicle history, fleet ownership and warranty data by pulling those data sets from internal systems of record
Run the analytics
Select a service depot based on the vehicle location and parts availability
Order a part if none are available and replenish inventory if this repair reduces the on-hand amount below a threshold
Book a service appointment
Notify the driver and other interested parties

Note that in this example, the orchestration tool is moving data and invoking application components to accomplish the desired business outcome. The responsibility of the orchestrator is to:

Invoke the right process at the right time
Ensure that one process completes successfully before the next one starts
Provide visualization and management of the workflow

Additional requirements are less obvious but just as critical.

We need a way to build the flow, and if errors occur, we need notification to be sent to interested parties. There must be a way to examine the details of each workflow step to determine what is being done, what was the result, view any messages that the processes may have generated, monitor status and progress and display that information, and of course we need logging and tracking for auditing and governance purposes.

Last but certainly not least, we need a way to assign business priorities to these tasks and some completion/service level rules to ensure the workflow operates within some agreed-to quality of service definition.

Why application workflow orchestration?

If you examine the steps described above, you may recognize similarities to other processes and tools called by different names, like data pipelines or schedules or even batch jobs.

A major reason for this new term is to make it very clear that new platforms, technologies, and data sources are very much a part of orchestration. Whether it is cloud, containerization, data and analytics, streaming and microservices architectures, they are all very integral part of application workflow orchestration.

Workflow orchestration best practices

Some of the many uses cases in which application workflow orchestration play a significant role include:

Orchestrating data pipelines
Training ML models
Detecting fraud in AML flows
Applying preventive maintenance analytics to maximize oil well production

Generally, IT focuses on the code being run by application workflow orchestration tools but rarely allocates the same level of attention to the design and maintenance of the workflows themselves. Since workflows are code, whether written in JSON, XML, Python, Perl or Bash, they should be treated like code.

Here are some recommendations for standard practices to consider adopting and capabilities to look for in selecting a application workflow orchestration tool.

Support an “as-code” approach

Whether the workflows are authored via some graphical interface or written directly in code, version control is mandatory. Of course, in order to enable modern deployment pipelines, it should be possible to store and manage workflows in some text or code-like format.

Think in microservices

Avoid monoliths. This applies to workflows just as it does to application code. Identify functional components or services. Use an “API-like” approach for workflow components to make it easy to connect, re-use and combine them, like this:

Service (Flow) A:

Do something1, emit “something1 done”

Emit “something2 running”, Do something 2, Emit “something2 done”

Emit “Service A” done

Service B:

Wait for “Service A” done

Do B thing, emit BThing done

Service C:

DO NOT run while something2 is running

Wait for BThing done

Etc.

Don’t reinvent the wheel

If you have a common function, create a single workflow “class” that can be “instantiated” as frequently as required, yet maintained only once.

Instead of creating multiple versions of a service, use variables or parameters that can accommodate the variety.

Process lineage

Data lineage is frequently cited as a major requirement in complex flows to support problem analysis.

Process lineage is just as important and a mandatory requirement for effective data lineage. Without the ability to track the sequence of processing that brought a flow to a specific point, it is very difficult to analyze problems.

The need for process lineage arises quickly when a problem occurs in a pub/sub or “launch-and-forget” approach used in triggering workflows.

Make the work visible

Process relationships should be visible.

One scenario where such visualization is particularly valuable is when everything appears perfectly normal, but nothing is running. Having a clear line of sight between a watcher or sensor that is waiting for an event and the downstream process that wasn’t triggered because the event did not occur can be extremely valuable.

Codify SLAs

The best way to define a non-event as an error is by defining an “expectation”, commonly called a service level.

At its most basic, an unmet service level agreement (SLA) is identified as an error. For example, we expect a file to arrive between 4:00 PM and 6:00 PM. It takes approximately 15 minutes to cleanse and enrich the file and another 30 minutes to process that file. So, we can set the SLA to be 6:45 PM. If the processing hasn’t completed by then, whether it’s running late or hasn’t even started yet, the error can be recognized at 6:45 PM if the flow hasn’t completed.

A more sophisticated approach is to use trending data to predict an SLA error as early as possible. We know the cleanse step runs approximately 15 minutes because we collect the actual execution time for the last ‘n’ occurrences. The same is true for the processing step. If the cleanse step hasn’t finished by 6:15, or if the processing step hasn’t started by 6:15, we know we’ll be late. We can generate alerts and notifications as soon as we know, so that we have the maximum time to react and possibly rectify the problem.

A final enhancement is providing “slack time” to inform humans how much time remains for course correction. In the above scenario, if the cleanse step doesn’t start on time, at 6:00 PM, there are 45 minutes available to fix the problem before the SLA is breached.

Categorize

As you are designing your workflow “microservices” and connecting tasks into process flows, make sure you tag objects with meaningful values that will help you identify relationships, ownerships and other attributes that are important to your organization.

Use coding conventions

Imagine creating an API for credit card authorization and calling it “Validate”. If your response is “sounds good”, this blog probably isn’t for you. I’m hoping most will think the name should be more like “CreditCardValidation” or something similarly meaningful.

This point is simply to think about the workflows you create in a similar way. It may be great to call a workflow “MyDataPipeLine” when you are experimenting on your own machine but that gets pretty confusing even for yourself, never mind the dozens or hundreds of other folks, once you start running in a multi-user environment.

Think of others

You may be in the relatively unique position of being the only person running your workflow. More likely, that won’t be the case.

But even if it is, you may have a bunch of workflows and you don’t want to have to re-learn every time you need to analyze a problem or when you modify or enhance it.

Include comments or descriptions or if it’s really complicated, some documentation. And remember to rev that together with the workflow.

Keep track

Inquiring minds want to know… everything. Who built the workflow, who ran it, if it was killed or paused, who did it and why? Did it run successfully, or did it fail? If so, when and why? How was it fixed?

And so on, and so on.

Basically, when it comes to workflows for important applications, you can never have too much information. Make sure your tool can collect everything you need.

Prepare for the worst

You know tasks will fail. Make sure you collect the data that will be needed to fix the problem and that you keep it around for a while. That way, not only can you meet the “Keep Track” requirement, but when problems occur, you can compare this failure to past failures or to previous successes to help determine the problem.

Harness intelligent self-healing

Finally, look for flexibility in determining what is success and what is failure. It’s correct and proper to expect good code but we have all seen code that issues catastrophic error messages, but the task completes with an exit code of zero.

You should be able to define what is an error and what is not and accordingly you should be able to define automated recovery actions for each specific situation.

What do you think?

Application workflow orchestration, in one way or another is almost universally used, but rarely discussed. It would be great to add lots of voices to this discussion.

Do you have some practices or requirements you would add or remove? Is there another point of view you would like to put forward?

Joe Goldberg – BMC Software | Blogs

Let Machine Learning Pipelines Benefit from Automation

Applying CI/CD Principles to ML

Minimizing Custom Code

Creating ML Models that are Automated, Scalable, and Reproducible

Accelerate Business Transformation with Mainframe Modernization

Introducing a new Control-M and AWS integration

Workflow orchestration meets mainframe modernization

Partnering for success

The path forward

Additional Control-M integrations with AWS

Orchestrate and Automate to Make DataOps Successful

How to orchestrate a data pipeline on Google Cloud with Control-M from BMC

Five orchestration challenges

Meeting the challenge

Real-world example

Use our demo system

How it works

Google Cloud services in the pipeline

Two ways to define pipeline workflows

Defining jobs in the pipeline

Scheduling pipeline workflows

Processing the data

File Transfer job type

Dataflow

SLA management

Examining the state of the pipeline

Conclusion

Introducing Control-M Python Client and Integrations

Control-M Python Client

Data Ecosystem and Cloud Services Integrations

12 Best Practices for Implementing Application Workflow Orchestration

Workflow Orchestration vs. Continuous Integration: What’s the Difference?

Control-M according to Control-M

Jenkins

Hear from Users

Jenkins

Control-M

And Analysts

So, What IS the Difference?

Bring Kubernetes to the Serverless Party

Sophisticated Automation helps Raymond James Drive Continuous Innovation

Workflow Orchestration: An Introduction

What are some types of orchestration?

Cloud orchestration

Service orchestration

Release orchestration

Application workflow orchestration

Why application workflow orchestration?

Workflow orchestration best practices

Support an “as-code” approach

Think in microservices

Don’t reinvent the wheel

Process lineage

Make the work visible

Codify SLAs

Categorize

Use coding conventions

Think of others

Keep track

Prepare for the worst

Harness intelligent self-healing

What do you think?

Related reading

Control-M according to
Control-M