Identifying Data Quality Issues21st November 2020
Common Causes of Data Quality Issues
Most often your data came from somewhere else before it was in your database. During transfers, data can get lost or modified in ways that make it unusable. If databases are mapped using old data structures or conversions, the data in the new database will be incorrect. Often users do not see what is what is actually being stored, so the data structure and mapping between the old and new databases are often the culprit for errors in the data.
Merging Databases and Other System Consolidations
When you phase out or combine an old system, a poorly planned merge can leave you with little time to plan or prevent errors. As the data is moving into a non-empty database, there is little flexibility for changes in the data structure. Often data will not fit in with the new structure, and duplicates and conflicts will result.
Manual Data Entry
Much data is typed into databases by people, and mistakes are inevitable. People misspell entries, use the wrong format for a field, enter data into the wrong box, or input the wrong value. Because these errors are not systematic, they can be difficult to trace or correct.
A batch feed carries large volumes of data, and any problem in it can cause issues magnified by future feeds. The problems may accumulate and become difficult to track down and fix. If an error finds its way into the source system, it can flow through the batch feeds and go unnoticed.
Real Time Interfaces
The problem with data exchanged through real-time interfaces is that there is little time to verify that the data is accurate. There are typically multiple points in the capture process where real time data can be corrupted or lost. The data comes in small packets, and can be incorrect, leaving you with unreliable data.
Disconnect Between Data Priorities with Business Priorities
Data priorities and business priorities should be in alignment. Avoid collecting and managing data that is not important to your business and which can corrupt other data elements. Prioritize data that is important to the business, and keeping the quality of that data high.
Before You Start a Data Cleaning Process, Make a Plan
Audit and Organize the Data
Understanding your data before cleaning improves the efficiency of your project and reduces the time and cost of data cleaning. Understand the purpose, location, flow, and workflows of your data before you start.
Document Data Quality Requirements and Define Rules for Measuring Quality
Create a reference for success, and targets to keep the project in check along the way. Set statistical checks on the data, and set a standard of quality control and completeness.
Create a Strategy
Outline a plan for your data quality that supports ongoing operations and data management. Identify the data sets that meet your quality standard, and the data sets that need to be cleaned. Identify possible solutions with a plan for implementation. Your general plan should be to define, identify, correct, and document data errors, and modify procedures to avoid errors in the future.
Identify and Correct Errors in Data The method to error detection will depend on the database and the dataset you are using. Depending on your team and the issues you have encountered, there are a variety of free or open source tools or services and enterprise solutions for cleaning your data. The pandas python library is an open-source software library for data manipulation and analysis. You can use CSVKit for converting and working with CSV files. R is a popular system for data cleaning, and features such as R plyr, Reshape2 or ggplot2 can be used for cleaning data. Trifacta Data Wrangler, Informatica and Trillium Software are among a few of the companies that offer data cleaning services.
Data is one of the most important part of any marketing platform. Without accurate, complete, consistent and current data marketers will have a hard time emailing different segments, improving marketing performance, optimizing campaigns and analysing past performances. But how do you identify data quality issues inside your own Marketing Automation Tool?
The first step is to talk to your sales and marketing teams. Chances are they are already struggling with data problems. The second step is to run analysis on your data to identify if you are due for a cleaning. And the third stay is to stop bad data from happening in the first place.
Here are some data quality issues that you should look out for and some ways of fixing them:
- Inability to segment. Think of all the different creative and messages that you want to tailor to your prospects and clients. For example, do you want to deploy an email to all Vis of Marketing only to find out that titles are not standardized and include thousands of variations.
Solution: Standardize your data into different buckets. In the example of VP of Marketing, any titles that are VP of Digital Marketing, VP of Product Development or VP of Marketing would all be classified into VP of Marketing.
- Bounce Rate is on the rise. You should monitor your email marketing bounce rate as it is a key indicator of the quality of your data. Analysing the source of high bounce rates will help you eliminate bad data providers while, looking at the date create will help identifying outdated leads and the need to refresh your leads.
Solution: Either perform email validation prior to deploying a new list or clean out historical emails by nurturing them.
- While most marketing automation tools will not allow to duplicate email address, you may uncover different emails for the same person. It is not easy to find out if you have duplicate issues. One way is to perform a simple test and check what percentage of your leads have personal domains such as Gmail, Yahoo and so on. If the percentage is greater than 5%, you should look at de duping especially if you are sending different offers to different segments.
Solution:De dupe your data using automatic de duplication tools or hiring a data cleaning company to do it for you.
- Incomplete Data: You may be sitting on MQLs that you cannot pass to sales due to a few missing fields that are required to reach the desired score.
Solution: Data appending using third party data providers can help fill in missing data such as industry, address, phone number, revenue information and number of Employees. Alternatively, you can use dynamic forms to fill in your missing data.
- Missing Deadlines. The final indicator that there are data quality issues are missed deadlines. It could be that your team is so busy dealing with issues with campaign naming that they cannot find the right campaign to attach to the latest emails. There could also be too many custom lists to identify a key segment or they are too busy trying to figure out why the latest email did not perform well. Their daily tasks take longer and longer time to complete.
Solution: Audit your data and processes inside your marketing automation tool to identify ways to streamline activities and make work more efficient for your marketing team.