You know from the adjective that it’s probably not good, but what exactly is “dirty data”?
Dirty data comes at a cost to companies; but, if you’re not a data scientist, how do you even diagnose that you have a problem?
Less risqué than Dirty Dancing, less provocative than Dirty Diana—dirty data is something you want to keep away from your company. You know the name that’s it’s probably not good, but what is dirty data?
Simply put, it’s data that has errors, mistakes, and is incomplete in some way. And it’s costing you and your company money. Chances are you, or at least someone at your company knows that’s a problem, but more than 90 percent of companies still aren’t keeping their data clean.
For marketers and salespeople, that’s 30 percent of data that becomes totally unusable. That comes through in missing emails, duplicate records, and inaccurate (or fake) audience and fan records. That’s a tough pill to swallow for live events like sports and festivals, which already have a tough time knowing fans who aren’t in their database.
Not keeping your data clean comes at a cost, but if you’re not a data scientist, how do you even diagnose that you have a problem? Luckily, Umbel’s Data Science team has seen (and cleaned) it all. Our data scientists recently came together to look at how you can diagnose dirty data with some tips on clean up across six areas.
In their discussion, they covered why some things are possible to fix after the fact, but also how you can save yourself headaches down the line with a solid data collection strategy from the very beginning. Beyond the don’ts, they also touched on: Basic principles of ‘Tidy Data’, Common data ‘gotchas’, Best practices for getting the most out of data sets.
The infographic outlines how the less tech-savvy among us can diagnose dirty data (and avoid it) across six areas.
Get started with cleaning up your data with this infographic. Just click or tap to view a larger version: