Taking the time now to formulate what data quality means to your company or organization can create a ripple-effect of improved customer service, a better customer experience, a higher conversion rate and longer customer retention – and I am sure you will agree those are the kinds of returns on investment that any business will wholeheartedly embrace!
Do you know what good quality is and how you would measure and improve it?
One of the biggest myths about data quality is that all your data has to be completely error-free. With websites and other campaigns collecting so much data, getting zero errors is next to impossible and prohibitively expensive. Instead, the data only needs to conform to the standards that have been set for it, however there will be exceptions i.e. postcode, first line of address. They need to be 100% accurate, if you are using them for mailing or deliveries.
What does Data Quality Management (DQM) mean?
Data quality refers to the condition of a set of values of qualitative or quantitative variables. There are many definitions of data quality but data is generally considered high quality if it is “fit for [its] intended uses in operations, decision making and planning” source: en.wikipedia.org/wiki/Data_quality
Outdated or incorrect data can lead to major blunders in business decisions and negatively impact customer relationships.
A focused approach towards data quality management have far-reaching benefits, therefore, a proactive approach towards controlling, monitoring and driving data quality is the key, rather than reacting to data failures or addressing detected data anomalies.
4 Step Plan to Data Quality
Here are my thoughts on the best way to achieve good consistent data quality.
Data Quality Assessment
Do you know where data is stored, how can you inspect it to be able to ascertain the data quality issues? What issues are hampering the business goals and to what standards or rules should the data comply to?
Understanding where you are now with your data quality provides a reference point to baseline and plan data quality improvements. This also enables you to measure the outcomes of successive improvements.
Take a top-down approach, understanding the criticality of the impact on the business of the poor quality data.
This approach can be complemented by bottom-up activity of data profiling, which will identify anomalies such as outliers in the data.
Map these anomalies to the potential impact on business goals. This correlation provides a basis for justification of the data quality activity and its linkage to impact on the business for the business case.
Record the results from this step listing the findings and the impact in financial terms. The report can be circulated amongst stakeholders, decision makers and hence drive data quality improvement actions and prioritisation.
Data Quality Measurement & Reporting
The next step is to narrow down the scope to identify critical data elements to help with prioritisation of activity. You cannot solve all the data issues at once due to the potential size of the work and budget required.
During your assessment activity you may have started to identify certain attributes & rules about the data for example some fields will always be number strings, where as others will only be an option from a list. Document this information, this forms the start of your Data Dictionary. https://en.wikipedia.org/wiki/Data_dictionary
Next to identify the dimensions which help validate the quality of the data. These along with the creation of thresholds and tolerances for acceptability will start to form your measurement criteria for creating dashboards to monitor progress towards better quality of data.
Data Quality Dimensions
What is a Data Quality Dimension?
A Data Quality Dimension is a term used to describe a data quality measure that can relate to multiple data elements including attribute, record, table, system or more abstract groupings such as business unit, company or product range.
Why are Data Quality Dimensions useful?
Data Dimensions and rules are defined that can act as an input to deciding the tools and techniques that should be deployed for achieving the desired levels of quality. This can help to embed data controls into the functions that acquire or modify the data within the data lifecycle.
There is no widespread agreement for a definitive list of Data Quality Dimensions but most practitioners recognise the importance of 6 core dimensions:
- Timeliness (often referred to as Currency)
- Validity (sometimes referred to as Conformity)
How do you use Data Quality Dimensions?
A Data Quality Dimension is typically presented as a percentage or a total count. For example, 97% of equipment codes were valid or 123,722 patient records were incomplete.
A single Data Quality Dimension may require several data quality rules to be created in order for a measure to be processed.
But ‘missing values’ may require a further set of data quality rules to execute a comprehensive measure. For example, someone may type in ‘N/A’ or ‘Unknown’ but this still equates to a missing value so we would need a processing rule to discover ‘hidden blanks’ within an attribute.
Due to the complexity and processing logic required to manage and control the usage of data quality dimensions, most organisations rely on data quality management software. This allows complex data quality rules to be consolidated into data quality dimensions. These can then be reused and applied across the whole organisation.
Overview of the Dimensions
1.Accuracy– Is the degree to which data correctly reflects the real world object, person or an event being described. Examples:
- Sales of the business unit are the real value.
- Address of an employee in the employee database is the real address.
- Are there incorrect spellings of product or person names, These issues can impact operational and analytical applications
To achieve accuracy you need agreement and business rules that are clear and consistent with good definitions
2.Completeness– is defined as expected comprehensiveness. Data can be complete even if optional data is missing. As long as the data meets the expectations then the data is considered complete.
For example, a customer’s first name and last name are mandatory but middle name is optional; so a record can be considered complete even if a middle name is not available.
Questions you can ask yourself: Is all the requisite information available? Do any data values have missing elements? Or are they in an unusable state?
3.Uniqueness– When measured against other data sets, there is only one entry of its kind. i.e only one Mr N Scott at Gloucester Place, London – De-dup your data.
4.Timeliness– How much of an impact does date and time have on the data? This could be previous sales, product launches or any information that is relied on over a period of time to be accurate. Including whether information is available when it is expected & needed.
Timeliness of data can be very important. This is reflected in:
- Companies that are required to publish their quarterly results within a given timeframe
- Customer service providing up-to date information to the customers
- Credit system checking in real-time on the credit card account activity
The timeliness depends on user expectation. Immediate online availability of data could be required for the room allocation system in hospitality, but nightly data could be perfectly acceptable for a billing system.
5.Validity– Does the data conform to the respective standards (format, type, range) set for it?
- Validity at data item level: type and severity of hearing loss should be chosen from a given list of allowable values
- Validity at record level: for any patient, the date/time of hearing screening should be after the date/time of birth.
6.Consistency– How well does the data align with a preconceived pattern? Dates share a common consistency issue, since in the U.S., the standard is MM/DD/YYYY, whereas in Europe and other areas, the usage of DD/MM/YYYY is standard.
Consistency means data across all systems reflects the same information and are in synch with each other across the enterprise. Examples:
- A business unit status is closed but there are sales for that business unit.
- Employee status is terminated but pay status is active.
Consistency is driven by reporting a metric having the same value for the same parameter regardless of who pulls it and where it is used.
Defining Acceptable Thresholds
Now that you have your dimensions, attitudes and rules that your data should comply to, the next step is to interview the business users to determine the acceptability thresholds. Scoring below the acceptability threshold indicates that the data does not meet business expectations, and highlights the boundary at which noncompliance with expectations may lead to material impact to the downstream business functions.
Integrating these thresholds with the methods for measurement completes the construction of the data quality control framework.
Missing the desired threshold will trigger a data quality event, notifying the data steward and possibly even recommending specific actions for mitigating the discovered issue.
Data Quality Processes
What processes do you have where data is created or amended? Review each process along with the prioritised data quality activity to identify where changes (big or small) can be made in the process to improve data and ultimately drive improved consistency from the point of data creation.
Data quality scorecards and dashboards can be defined for each business units data quality derived from these metrics and their thresholds. These scores can be captured, presented visually and periodically updated to monitor the improvement
As part of your framework your aim is not only to mitigate, but to remediate issues and eliminate their root causes with the reasonable times established within a data quality Service Level Agreement. Define consistently how to measure, log, collect, communicate & present the results to those with entrusted with data stewardship and data quality.
Check out earlier posts on IDENTIFY, CURE, CARE (ICC), HOW TO RECORD YOUR ISSUES
Good luck with your Data Quality endeavors and come back next month for more on data quality scorecards.