What Happens When Big Data Becomes Bad Data?

BY

Hadoop: When Big Data Becomes BadMost competitive enterprises now use big data for business intelligence activities. For example, Facebook leverages its big data sets—in real time—to determine what ads to place in your sidebar while you’re checking your social feeds or updating your status. In another example, you probably receive emails from your service provider, such as AT&T or Verizon, about your data plan usage to prompt you to upgrade your service.

But what happens when these big data use cases go bad, even for seconds? If the timely placement of ads gets stalled or a promotional email is sent at the wrong time, you’ve missed a revenue opportunity or frustrated a customer. When you consider these missteps on a large scale, for thousands or even millions of transactions, the loss in revenue or added expense from customer complaint logs can be devastating.

Just recently, the bike share company in my community sent me an email message to update my subscription after I had already done so. I called them with my concern and they said it was a system-wide error. This made me then wonder how many other subscribers might have called and tied up the support lines. This company—like many others relying on their big data—had to spend extra time and resources to correct this situation with a follow up apology email message.

Enterprises, similar to this bike share business (new, with fully integrated digital footprints for the locations of all the bikes, who’s using them, at what times, and in which neighborhoods), take advantage of big data to make informed decisions from both structured and unstructured data. Structured data includes content from relational databases and XML schemas that are pretty straightforward in terms of “getting” the data you need. Unstructured data can be content from dweb logs, comments, blog posts, email messages, or any text document, as well as audio, video, or image files.

The strategies to tap into unstructured data usually require more sophisticated algorithms to parse through and find specific data. For example, a big data strategy could find out which users are booking trips to specific destinations and then what their internet surfing habits are once they confirm a trip—a user books a trip to Seattle, checks the weather forecasts, and then shops for rain coats and umbrellas.

One of the most common ways to handle big data sets for enterprises is to deploy Hadoop, an open-source solution that provides distributed storage and processing of data sets on computer clusters built on relatively simple and inexpensive hardware. The most commonly used Hadoop distributions come from the following vendors:

  • Apache Hadoop
  • IBM InfoSphere BigInsights
  • Cloudera
  • Hortonworks
  • MapR

Being able to effectively monitor Hadoop big data processes helps you proactively diagnose performance and availability issues so that your business-critical big data analytics don’t come to a halt at the wrong time. Behavioral learning capabilities can detect when Hadoop clusters are struggling to keep pace with the business and innovative monitoring tools can provide real-time visibility into Hadoop environments to help pinpoint issues and optimize the infrastructure.

Think about it. When IT staff can prioritize and troubleshoot performance issues impacting Hadoop processes and quickly fix them, they’ll have more time to get to the business of coming up with strategic big data analytics that help grow the business and increase revenue. It’s that simple—really.

Related posts:

E-Book: Leverage AI for ITOps


As digital business increases the need for IT speed, agility, and innovation, AIOps can help you manage performance effectively across part of your complex hybrid environment.

Read the E-Book ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post


Patrick Campbell

Patrick Campbell

Patrick T. Campbell has spent his 20+ year career equally between Application and Network Performance Management and K-12 Education. As a Technical Marketing Engineer, he began his career in IT at InfoVista as a Technical Trainer, followed by Raytheon Solipsys, OPNET Technologies (Riverbed Technology), and now BMC Software. In K-12 Education, he taught mathematics at Drew College Preparatory School for seven years and then worked at the University of Maryland Baltimore County (UMBC) as a Mathematics and Science Professional Development Program Co-Director for International Teacher-Scholars from Egypt for another two. Passionate about learning, he has presented at OPNETWORK and at NAIS Teacher Conferences. Patrick received a B.S. in Industrial and Management Systems Engineering from Penn State, and has a Master’s Degree in Human Resource and Behavioral Science from Johns Hopkins University.