Alon Lebenthal – BMC Software | Blogs

Happy Data’s Here Again – Simplify Your Workflow Orchestration

Alon Lebenthal — Wed, 11 Sep 2019 00:00:11 +0000

Let’s talk about data. By now, we understand that data is coming at us at different shapes and formats, at a pace that we have never experienced before. More importantly, we are now aware of how important data is. Knowledge is power. Data brings you and your business the power to flourish and succeed.

Yet, success is not only a function of the amounts of data you gather or not even its quality. Wine makers know that having great vineyards means nothing if you don’t know how to produce good wine. So, you may be sitting on a thousand “barrels” of data you have collected from the best “vineyards” and still, not getting the desired outcome – insights to the business.

Another important aspect is of course having the right tools in place. Running Big Data projects means using and taking advantage of a whole net of technologies to help you gather the data, store it, process it and finally, run the analytics. These are the four major steps of every big data project and each step introduces growing levels of complexity.

This complexity only increases with the introduction of cloud. Naturally, with the large amounts of data generated in the digital era, you want to look at cheaper options rather than keep buying more and more hardware.

To make it even more appealing, all major cloud vendors offer a comprehensive, rich set of data services for ingesting, storing, processing and analyzing data. Most of the new data-driven applications are developed from day one in the cloud and solutions such as Amazon EMR, Azure HDinsight and other cloud-based data servers are becoming very popular.

There are a growing number of tools and processing elements out there and you need them all to be well connected, and run in harmony, to ensure your data pipeline doesn’t break.

How do you do that? Simplify your application workflow orchestration

You can try and solve the complexity with scripting. Many spend a lot of time and resources writing and maintaining scripts to ingrate it all. However, do you want your expensive data engineers spending time on operational plumbing? How scalable is this solution? Can you really guarantee that scripting will hold your data pipeline together?

Speaking to many customers, I have learned that scripting is just not good enough. It is expensive and even worse – risky. What organizations really need is a reliable product that can orchestrate their entire data pipeline – regardless of which technologies they use. No one wants automation silos and you want to be able to get end-to-end visibility of the data pipeline across disparate sources of data.

Also, as we know, nothing is more constant than change and this is very much true about data-driven projects. The various elements of a data pipelines often change, and you need to ready for this change with an orchestration solution that can handle whatever you throw at it.

BMC will be at the Strata Data conference in New York City on September 23^rd through the 26^th. Come and talk to us at booth 1439 to learn how to simplify your application workflow orchestration.

The Cloud: Here, there and everywhere

Alon Lebenthal — Wed, 19 Jun 2019 00:00:29 +0000

In this interview, I talk with Alon Lebenthal, a Senior Solutions Marketing Manager at BMC, of what the cloud hype is about; it seems that it is all we hear about in the tech and business world.

Q. It seems that the cloud is everywhere these days and all organizations are either on the cloud or in the process of moving to the cloud – Is that true? What’s driving this?

Alon Lebenthal: Yes, the cloud is being adopted faster than ever and that includes both public and private clouds and sometimes a combination of both. Actually, nowadays an increasing amount of organizations are using more than one cloud platform. The reasons for the cloud adoption are clear; the cloud is allowing companies to scale at the business demand in a very flexible way and the result is that cloud has become the platform of choice of many new developed applications.

So, as organizations move to the cloud they are also leveraging the extensive set of services provided by the cloud Vendors as well as the overall growing cloud Ecosystems including infrastructure and solutions big data machine learning etc.. Another thing to keep in mind is that the richness of the cloud Ecosystem introduces new levels of complexity as companies run their applications in diverse infrastructure and take advantage of new technologies.

Q. You mentioned the word “complexity” – why is it so complex?

Alon: First, the cloud is introducing new technologies daily. But more than that, these technologies usually don’t replace everything that was running in the organization’s environment before. So, the result is a need to run application workflows across a web of diverse infrastructures and a growing number of technologies and applications. Orchestrating these diverse workflows is becoming more and more complex and while there are multiple automation tools, experience shows that integrating all these moving pieces is quite challenging. In fact, developers find themselves spending too much time, precious time, on scripting and error handling what can be described as operational plumbing instead of being productive and spending their time and efforts on developing the application itself. Naturally, this complexity and the associated scripting may result in delays with moving the applications to production or even put the entire project at risk.

3. How can you manage this complexity and assure that business applications are well orchestrated and integrated?

Alon: Our experience is that if you want to accelerate application delivery in the cloud you need a true application workflow orchestration product that can simplify the automation of the business processes across diverse infrastructure, data and applications. It needs to be platform agnostic and be able to orchestrate the workflows wherever these run as well as being able to absorb new technologies. So whatever technology the organization decides to use in the future as a part of the application workflow, you will be able to orchestrate it as well, and integrate it with the other parts of the business. It’s all about end-to-end orchestration of the application workflows as well as the end-to-end visibility of the workflows.

4. Can you give me a typical use case for the need of an application workflow orchestration?

Alon: Let’s talk about big data or data pipelines. Many of the organizations that are implementing big data these days are doing it in the cloud and if we look at the typical big data process it usually consists of a few steps. You need to ingest the data, store that data in the cloud, process the data and finally, analyze the data and provide insights to the business. Each step may include different technologies, and different set of tools being used.

In fact, cloud vendors like MS Azure, AWS and others are offering a long list of good and efficient solutions on their platforms to support these big data processes such as: storing the data, processing data, analyzing data and so on. So, to take advantage of all those cloud vendors offerings. What customers do is move data from on premise to the cloud and then they leverage the storage process analytics capabilities of these services. It’s complex and it is difficult to manage, so what you really need is an application workflow orchestration tool that can simplify this process, run the workflows end-to-end and ensure the Service levels are met.

5. So is it mostly about running your workflows anywhere? On-premises, on public or private cloud or any combination of those?

Alon: It’s about being able to orchestrate workflows wherever they run and being flexible about it so if there will be changes in the future, you can easily adjust to those changes. But more than that, it’s also about simplifying the orchestration process and making it reliable and ensuring that all the workflows run on-time, every time. It includes not only managing the workflows, but also ensuring that the SLAs are met, taking care of auditing and governance of the application workflows and more.

6. Are there data security concerns with using public cloud or a combination of public and private cloud services?

Alon: Well obviously these days cloud vendors are provided a secure way to manage data in their platform. But if we look at it from application workflow orchestration standpoint then naturally when you move data from one place to another, you need to ensure that the data you are transferring is well monitored and well managed. As I mentioned, auditing and governance is extremely important, especially when it comes to transferring data between platforms and transferring data to the cloud, you need to be able to ensure that the data you have is the right data and also to have a good understanding of who touched the data during the process and how it was managed.

To learn more, check out Control-M on the Cloud

We will also be at the Cloud Expo in Santa Clara from June 24^th to the 26^th. We hope to see you there at booth #400! Also, don’t miss my June 25^th presentation titled, Application Workflow Orchestration in a Multi Cloud World.

Who Cares How Big the Data Is? It Doesn’t Really Matter

Alon Lebenthal — Mon, 10 Jun 2019 00:00:25 +0000

Only a couple of years ago, while everyone was talking about big data, the impression was that very few people knew what it meant. Different people had different interpretations of the term big data. Yes, most everyone in the industry could name the Volume – Variety – Velocity trilogy but the market was still immature, and the majority of big data projects were in their infancy.

Only a few years have passed and while big data is now considered mainstream, and the big data market is flourishing, we also hear more and more that “Big data is dead”. A bit confusing, I admit.

Let’s try and clear this confusion by understanding what has changed in the big data market and key trends driving those changes including cloud, Machine Learning and Artificial Intelligence (AI).

The cloud. Who doesn’t have something “moving to the cloud”? With that in mind, many of the new big data applications are being developed to run in the cloud. Given some of the clear advantages that cloud can offer for managing and “crunching” a lot of data, it has become the platform of choice for many big data projects. The major cloud vendors (AWS, Azure and Google Cloud) are offering a comprehensive, rich list of big data related services; they’ll move your data, store your data, process your data and, of course, analyze your data.

Machine Learning and AI. Expecting systems to self-improve means learning from experience and that requires being able to use large amounts of data. Machine Learning algorithms rely on data – on its quantity but nonetheless, also on its quality.

Nobody is questioning that there is an enormous amount of data available and more being collected every day, hour, minute, even by the microsecond. And there are reports out there that estimate how much data is produced today and how much will be produced by some date years down the road. How big is big then? Does it really matter?

Think about it this way. Organizations are sitting on goldmines of data and they care only about one thing – how to make the most out of this data and how to provide insights to the business, not only to remain competitive and increase profits, but also to thrive, not just survive, in the market.

Organizations implementing big data need to adopt new technologies and new platforms such as Hadoop, Spark, Mesos and of course the multiple cloud vendor provided solutions. They will be ingesting high volumes of data from multiple sources and processing this data, before making it available for analytics.

And finally, organizations use many, many tools, lots of time and much of their valuable talent to develop scripts to integrate it all. Yet, integrations are not easy and manual scripting doesn’t always easily deliver scalable results. This is where many organizations are struggling:

How do I successfully orchestrate my big data workflows?
How do I ensure my SLAs are met?
How do I have my data engineers focused on actionable data rather than spending precious time on operational plumbing?

Here are a few tips for orchestrating your big data workflows:

Start early – Orchestration of the data pipelines is critical to the success of the projects and delaying it until the application is ready to move to production, may result in unnecessary errors and delivery delays.
Think big – You will be using multiple technologies for the various steps of your big data implementation and the technologies will change often. Consider having an application workflow orchestration solution that can cope with this diverse data.
Avoid automation silos – you want to be able to get end-to-end visibility of the data pipeline across disparate sources of data
Developers are expensive – get them to focus on the data itself rather than building scripts for operational plumbing

Want to learn more? Check out Control-M for Big Data

Data Architecture vs Information Architecture: What’s The Difference?

Alon Lebenthal — Fri, 05 Oct 2018 00:00:11 +0000

There’s a well-known argument around data architecture versus information architecture. And the question often asked is: Are they the same thing?

Enterprise architect and Microsoft blog contributor, Nick Malik, recognized the inherent confusion when he was part of a group working to clean up the Wikipedia entries on the subjects. His team believed the entries should be combined. However, in 2014, when he polled the IT community he soon discovered a split audience, where about half of all survey participants believed the two should remain separate.

Let’s take a look at the differences between data and information and the key considerations your enterprise organization needs to understand.

Data Architecture vs Information Architecture

This author agrees that information architecture and data architecture represent two distinctly different entities. There are a couple of reasons for this as described below:

Distinction in Data vs Information

Simply put, data refers to raw, unorganized facts. Think of data as bundles of bulk entries gathered and stored without context. Once context has been attributed to the data by stringing two or more pieces together in a meaningful way, it becomes information.

Similarly, it’s also important to understand the difference as it regards infrastructure:

Information architecture refers to the development of programs designed to input, store, and analyze meaningful information.
Data architecture is the development of programs that interpret and store data.

Distinction in Architecture

Since we’ve established that data and information are not the same, it stands to reason that they can’t be treated the same way in their architecture platforms.

Data architecture is foundational. It looks at incoming data and determines how it’s captured, stored, and integrated into other platforms. One such platform is likely a piece of information architecture, like a CRM, that uses raw customer data to draw meaningful connections about sales and sales processes.

The CRM is the information architecture in this example because it specializes in taking raw data and transforming it into something useful.

That’s the clear distinction between data architecture and information architecture. Data architecture defines the collection, storage and movement of data across an organization while information architecture interprets the individual data points into meaningful, useable information.

An “information asset” is the name given to data that has been converted into information. And creating information assets is the driving purpose of information architecture. Information assets can exist in one of several categories:

Catalogues
Dashboards
Documents
Ontologies
Schedules
Taxonomies
Templates
Terminologies

Each category suggests the conversion of data into something that is helpful for business initiatives, whether it be a grouping of like data or a visual representation that can offer a meaningful snapshot of data to stakeholders.

Data and Information Lifecycle Management

Another distinction relates to requirements from a lifecycle management perspective. Besides the obvious difference between data and information, each has a unique lifecycle and best practices for managing it within an organization. Similar to how data infrastructure is at the foundation of solid information infrastructure, proper data lifecycle management will be a key driver of the information lifecycle management process.

Now, let’s dive into some more definitions.

Data lifecycle management refers to the automated processes that push data from one stage to the next throughout its useful life until it ultimately becomes obsolete and is deleted from a database. On the other hand, information lifecycle management looks at questions like whether or not a piece of data is useful, and if yes, how? In a nutshell, information lifecycle management seeks to take raw data and implement it in a relevant way to form information assets.

In addition, information assets have their own lifecycle and value, which are determined by the quality and usefulness of data involved as well as the type of asset as described above. Part of the information lifecycle process requires developers to consider future state implementations.

For instance, making recommendations that a piece of data could be better implemented as a dashboard or document attachment. This may be required to improve overall consumption of knowledge throughout an organization, democratize information or create more meaningful insights.

Data-Driven Business Models

More and more, IT departments are becoming an integral part of the enterprise business model. Gone are the days when IT departments were ancillary to process. Now, the vast majority of departments and processes are powered by IT innovation.

A study by the University of Cambridge suggests that increasingly businesses are creating new models to accommodate a commitment to data and information. And results show that this approach is paying off, offering increases in productivity over competitors.

The report suggests that when coming up with a new business model, enterprise business leaders ask themselves these questions:

What is our target outcome for a data-driven business model?
What would we like to offer our target market?
What software, hardware and services do we require to deliver on this model?
Where are we going to acquire these resources?
How will collected data be used?
How can this be monetized to support a revenue model?
What challenges will we face in accomplishing these goals?

But even after a data-driven model has been created, some companies fail because they don’t understand the importance of a workflow that pushes data through the lifecycle and through the process of becoming an information asset.

Establishing best practices and a workflow in your data and information life cycles provides the following benefits:

Improves overall speed to market
Greatly reduces the complexity between all cloud environments
Readily scalable
Helps mitigate risk
Improves integration

In order to achieve this, companies should look at how they can integrate, automate, and orchestrate these workflows. Application Workflow Orchestration solutions such as Control-M, help organizations to abstract the complexity involved with the numerous data sources, multiple applications and diverse infrastructure. It help organizations to focus on creating new information assets and delivering insights to the business, rather than spending precious time and efforts on fixing broken workflows.

Still, with all things considered, enterprise businesses must have the right IT employees in place to create a functional business model. Below is an employee snapshot created for both information architecture and data architecture.

Employee Snapshots: Information vs Data

At the heart of a well-functioning enterprise business is an IT department with the right people in place to manage their information and data architectures. In the following text, we will look at positions that may be necessary for data architecture, information architecture or both.

Chief Information Officer (CIO)

The CIO of an enterprise organization makes important decisions about technology and innovation, and is central to any digital transformation or shift toward IT in enterprise business model.

Some responsibilities in this role include innovating, integrating cloud environments, motivating the IT department and establishing an IT budget based on projected needs. The CIO will make decisions regarding both data and information architecture. As it regards data architecture, one of the big considerations will be deciding between a data lake and a data warehouse. More on these points later.

(Compare CIOs to CTOs.)

Information Architect

The information architect is integral to information architecture and automated lifecycle management processes. He or she will implement information structure, features, functionality, UI and more. The primary role of the information architect is to focus on structural design and implementation of an infrastructure for processing information assets.

Data Architect

Like an information architect, data architects work on the structural design of an infrastructure but in this case it’s specific to collecting data, pulling it through a lifecycle and pushing it into other meaningful systems.

Data Analyst

The data analyst’s typical day involves the gathering, retrieval and organization of data from various sources to create valuable information assets. This is someone who likely works in both systems comprised of data architecture and information architecture.

More and more, some functions of the data analyst are being automated, but even with automation, analysts remain important to the creation of future information states.

Information Analyst

Information analysts specialize in the extraction and analysis of information assets.

A quick note: Data lakes vs data warehouses

Data lakes have been rising in popularity these days but are still confused with data warehouse. However, it’s important to realize that these two have unique differences and are used in different ways. A data warehouse refers to a large store of data accumulated from a wide range of sources within an organization.

A warehouse is used to guide management decisions.
A data lake is a storage repository or a storage bank that holds a huge amount of raw (unstructured) data in its original form until it’s needed.

(Read more about the differences in data lakes & warehouses.)

Final Thoughts: Data Architecture vs Information Architecture

Hopefully by now, it’s clear why information and data architecture are two different things. If not, here’s a quick recap.

Data and information architecture have distinctly different qualities:

They work with different assets: data assets vs information assets
They yield different results
They have distinctly unique life cycles
They require different things from an architecture perspective
They require roles with different specialties to be part of an enterprise organization

Although data and information architecture are unique, an important takeaway is that they rely on each other in order for enterprise organizations to gain the insights they need to make the most informed business decisions.

Five Reasons You Need a Step-by-Step Approach to Workflow Orchestration for Big Data

Alon Lebenthal — Mon, 09 Jul 2018 00:20:44 +0000

Is your organization struggling to keep up with the demands of Big Data and under pressure to prove quick results? If so, you’re not alone. According to analysts, up to 60% of Big Data projects are failing because they can’t scale at the enterprise level. Fortunately, taking a step-by-step approach to application workflow orchestration can help you succeed. It begins with assessing the various technologies for supporting multiple Big Data projects that relate to these four steps:

Ingesting data
Storing the data
Processing it
Making data available for analytics

The approach also requires having a reliable application workflow orchestration tool that simplifies the complexity of Big Data workflows, avoids automation silos, connects processes, and manages workflows from a single point. This allows you end to end automation, integration and orchestration of your Big Data processes, ensuring that everything is running successfully, meeting all SLAs, and delivering insights to business users on time.

Cobbling together disparate automation and orchestration tools that don’t scale, may cause delays and put the entire project at risk. Here are some of the benefits of beginning your Big Data project with application workflow orchestration in mind and using a tool that supports these steps:

1. Improve quality, speed, and time to market

Many Big Data projects drag on or fail. If developers don’t have the tools to properly scale their efforts, they may either write numerous, hard-to-manage scripts or rely on limited functionality tools for scheduling. Their tools may not integrate well with other processes, such as file transfers. With a workload orchestration solution, you can implement Big Data projects quickly to help retain your customer base and maintain a competitive edge.

2. Reduce complexity in all environments – on premises, hybrid, and multi-cloud

A Big Data workflow usually consists of various steps with multiple technologies and many moving parts. You need to simplify workflows to deliver big data project successfully on time, especially in the cloud, which is the platform of choice for most Big Data projects. The cloud, however, adds to the complexity, so your orchestration solution needs to be platform agnostic, supporting both on-premises and multi-cloud environments.

An orchestration tool that can automate, schedule, and manage processes successfully across the different components in a Big Data project reduces this complexity. It can manage the main steps of data ingestion, storing the data, processing the data, and finally the whole analytics part. It should also provide a holistic view of the different components and technologies they use to orchestrate those workflows.

3. Ensure scalability and reduce risk

As I mentioned earlier, Big Data projects must be able to scale, especially when you start moving from the pilot phase to production. Processes for developing and deploying Big Data jobs need to be automated and repeatable. Once the pilot runs successfully, other parts of the business will look into taking advantage of Big Data projects as well. Your workload orchestration solution should make it easy scale and support the growing business demands.

4. Achieve better Integration

Big Data automation open source solutions have generally limited capabilities and lack essential management features. More than that, they tend to be limited to a specific environment (ie Hadoop) but keep in mind that Big Data is not an island. It often needs to integrate with other parts of the business. So, your Big Data projects should be connected with upstream and downstream applications, platforms and data sources (ie ERP systems, EDW etc) our big data orchestration solution should provide this capability.

5. Improve reliability

It’s important to run Big Data workflows successfully to minimize service interruptions. Using a patchwork of tools and processes makes it hard to identify issues and understand root cause, putting SLAs at risk. If you can manage your entire Big Data workflow from A to Z, then if something goes wrong in the process, you’ll see it immediately and know where it happened and what happened. Using the same solution orchestrating your entire processes and managing them from one single plane of glass, simplifies managing your services and assuring they run successfully.

Looking ahead

Taking a step-by-step approach to application workflow orchestration simplifies the complexity of your Big Data workflows. It avoids automation silos and helps assure you meet SLAs and deliver insights to business users on time. Discover how Control-M provides all of the capabilities to enable your organization to follow this approach and how it easily integrates with your existing technologies to support Big Data projects.