Machine Learning & Big Data Blog

Data Quality: Top Concepts & Best Practices for Enterprise IT

4 minute read
Muhammad Raza
image_pdfimage_print

Data drives business decisions that determine how well business organizations perform in the real world. Vast volumes of data are generated every day, but not all data is reliable in its raw form to drive a mission-critical business decision.

Today, data has a credibility problem. Business leaders and decision makers need to understand the impact of data quality, especially within your own organization.

In this article, we will discuss what Data Quality means, particularly in the world of enterprise IT. Then, we’ll look at some best practices that ensure and maximize high data quality.

What is data quality?

Data quality refers to the utility of data as a function of attributes that determine its fitness and reliability to satisfy the intended use.
These attributes—in the form of metrics, KPIs, and any other qualitative or quantitative requirements—may be subjective and justifiable for a unique set of use cases and context.

If that feels unclear, that’s because a single formal definition of data quality doesn’t exist. (The way you define a quality dinner, for instance, may be different from a Michelin-starred chef.) Instead, data is perceived differently depending on the perspective:

  • Consumer
  • Business
  • Scientific
  • Standards
  • And others

In order to understand the quality of a dataset, a good place to start is to understand the degree to which it compares to a desired state.

For example, a dataset free of errors, consistent in its format, and complete in its features, may meet all requirements or expectations that determine data quality.

(Understand how data quality compares to data integrity.)

Defining data quality in enterprise IT

Now let’s discuss data quality from a standards perspective, as it is widely used particularly in the domains of:

Let’s first look at the definition of ‘quality’ according to the ISO 9000:2015 standard:

Quality is the degree to which inherent characteristics of an object meet requirements.

We can apply this definition to data and the way it is used in the IT industry. In the domain of database management, the term ‘dimensions’ describes the characteristics or measurable features of a dataset.

The quality of data is also subject to external and extrinsic factors, such as availability and compliance. So, here’s holistic and standards-based definition for quality data in big data applications:

Data quality is the degree to which dimensions of data meet requirements.

It’s important to note that the term dimensions does not refer to the categories used in datasets. Instead, it’s talking about the measurable features that describe particular characteristics of the dataset. When compared to the desired state of data, you can use these characteristics to understand and quantify data quality in measurable terms.

For instance, some of the common dimensions of data quality are:

  • Accuracy. The degree of closeness to real data.
  • Availability. The degree to which the data can be accessed by users or systems.
  • Completeness. The degree to which all data attributes, records, files, values and metadata is present and described.
  • Compliance. The degree to which data complies with applicable laws.
  • Consistency. The degree to which data across multiple datasets or range complies with defined rules.
  • Integrity. The degree of absence of corruption, manipulation, loss, leakage or unauthorized access to the dataset.
  • Latency. The delay in production and availability of data.
  • Objectivity. The degree with which data is created and can be evaluated without bias.
  • Plausibility. The degree to which dataset is relevant for real-world scenarios.
  • Redundancy. The presence of logically identical information in the data.
  • Traceability. The ability to verify the lineage of data.
  • Validity. The degree to which data complies with existing rules.
  • Volatility. The degree to which dataset values change over time.

DAMA-NL provides a detailed list of 60 Data Quality Dimensions, available in PDF.

Best practices for data quality

Data quality can be improved in many ways.

First and foremost, data quality depends on how you’ve selected, defined, and measured the quality attributes and dimensions.

In a business setting, there are many ways to measure and enforce data quality. IT organizations can take the following steps to ensure that data quality is objectively high and is used to train models that produce the profitable business impact:

  • Find the most appropriate data quality dimensions from a business, operational, and user perspective. Not all 60 data quality dimensions are necessary for every use case. Likely, even the 12 included above are too many for one use case.
  • Relate each data quality dimension to a greater objective and goal. This goal can be intangible, like user satisfaction and brand loyalty. The dimensions can be highly correlated to several objectives—IT should determine how to optimize each dimension in order to maximize the larger set of objectives.
  • Establish the right KPIs, metrics, and indicators to accurately measure against each data quality dimension. Choose the right metrics, and understand how to benchmark them properly.
  • Improve data quality at the source. Enforce data cleanup practices at the edge of the network where data is generated (if possible).
  • Eliminate the root causes that introduce errors and lapses in data quality. You might take a shortcut when you find a bad data point, correcting it manually, but that means you haven’t prevented what caused the issue in the first place. Root cause analysis is a necessary and worthwhile practice for data.
  • Communicate with the stakeholders and partners involved in supplying data. Data cleanup may require a shift in responsibility at the source that may be external to the organization. By getting the right messages across to data creators, organizations can find ways to source high quality data that favors everyone in the data supply pipeline.

Finally, identify and understand the patterns, insights, and abstraction hidden within the data instead of deploying models that churn raw data into predefined features with limited relevance to the real world business objectives.

It’s easy to use SaaS options that have predefined data features—but it can hinder our full and deep understanding of data and the business.

Related reading

e-book: Choosing the Right Metrics for Enterprise IT

Every business and organization can take advantage of vast volumes and variety of data to make well informed strategic decisions — that’s where metrics come in. This e-book introduces metrics in enterprise IT. Organizations of all shapes and sizes can use any number of metrics. In this e-book, we’ll look at four areas where metrics are vital to enterprise IT.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

BMC Bring the A-Game

From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise.
Learn more about BMC ›

About the author

Muhammad Raza

Muhammad Raza is a Stockholm-based technology consultant working with leading startups and Fortune 500 firms on thought leadership branding projects across DevOps, Cloud, Security and IoT.