3 Strategies to Solve Data Integrity Issues & Gain Visibility

Data integrity is the silent killer of data strategies. Like rotted wood, it can easily escape notice unless you’re looking for it. If you discover it after the wall starts to fall apart, you’re too late. 

In this post, you’ll learn exactly what data integrity is and what it looks like. My clients tell me that working on integrity issues is frustrating and I know exactly what they mean.

If you have ever found yourself surrounded by wildfire smoke, you know how debilitating it can be for visibility. You can’t see more than a few meters in front of you. It looks like fog but it’s not. This is what it can feel like when you’re trying to solve data integrity problems.

How Data Trust Can Go Wrong

Let’s talk about concrete examples here. Anyone can say “data integrity is important because you need to trust your data” or “if you don’t have accurate data, you can’t make the right decisions”. These comments are obvious.

Let me give you a few analogies of what happens when you don’t have data integrity:

  • It’s like running the football all the way to the end zone but dropping it a few feet from a touchdown.
  • It’s like spending all day cooking delicious food but forgetting to eat it before it goes bad.
  • It’s like trying to construct a building on fragile materials that will collapse once you go high enough.

If you have spent significant time and money on getting your data setup, this is for you. Teams that lack data integrity are running on data fumes. Many of my clients seek me out at this point because they don’t know where the plan went wrong.

I had a client that had a sophisticated tracking system. Anything that I could think of, they were tracking it. Cross-domain, UTM tracking, referrers, customer behavioral traits, and more. It was all there.

The only issue is that they didn’t trust any of the data. They had a suspicion that parts of the tracking weren’t accurate so they dismiss the entire data set. It turns out that they were right. A small portion, 10% of users, were experiencing tracking issues. However, this 10% was big enough to derail an entire data strategy.

Another client was unable to reconcile the numbers they were seeing on their dashboards. The numbers seem wrong and they couldn’t show these incorrect numbers to their clients. Managing expectations is paramount when it comes to sharing data with external stakeholders.

We worked together to debug all the formulas and data sources until the numbers made sense. Besides the technical changes, I helped them understand how the formulas actually worked and why metrics might look different from what they expected.

Once they had trust in the integrity of the data, they could now go out and evangelize this dashboard to their stakeholders. 

I had another client that had so much data, even I was impressed. Every little thing the customer did, they were tracking. Their challenge was that their team was completely overwhelmed. They didn’t know where to start with the analysis.

We worked together to clean up the tracking in 3 ways:

  • Made the names of data points logical and obvious. Instead of “element_touch_home”, we called it “clicked_signup_button”.
  • We tracked less. It’s not about how many data points you have but what you do with them.
  • We ran training sessions (group and individual) to get people comfortable with the data.
Is your data schema clear and concise?

These are three concrete examples of how data integrity can go wrong. It could be something small or large. The lack of integrity may be real or made up. Either way, the outcome is the same. If people don’t trust the data you’re providing them, you might as well not give them anything.

Data Integrity Best Practices & Strategies

Now that we know what data integrity issues look like, let’s shift to fixing these issues. There are 3 strategies that you can employ:

Find Pillar Data Points

Running checks on hundreds of data points isn’t feasible. Anything more than 30 – 40 becomes an issue. Instead, you should find “pillar” data points that you can use to measure the accuracy of the entire data set.

These are data points that are important to your team and representative of your tracking. For example, your pillar points may be signups, photo uploads, subscriptions created, and account cancellations. 

Create a data debugging dashboard for pillar events.

Having accuracy on these 4 points is critical and they also represent the different ways in which you track data. You may be using a combination of client side tracking (Javascript) and backend tracking (HTTP, Ruby, etc).

On these pillar points, you can set up notifications if any anomalies happen. This could be sudden spikes or drops which could alert you of possible issues. 

Schedule Data Reviews

Just because something is accurate once doesn’t mean it will stay that way. Schedule reviews to check your pillar data points. If accuracy has been a major issue for your company, start with monthly reviews. As data becomes stable, move to quarterly. If someone notices potential issues, tackle them ad-hoc. 

Track Fewer Points

Tracking fewer data points is a great way to improve integrity. If you’re struggling to juggle 5 balls in the air, the fix isn’t to add the sixth ball. The solution is to go down to 3 balls and slowly build back up to 5.

Don’t let rotted wood go unnoticed. You need to proactively look for these issues and tackle them head-on. Fixing panels of wood e.g. specific events is easier than trying to fix an entire wall or house. The strategies in this post will help you do this.

One more thing before you go!

I send a weekly newsletter called the Growth Needle. It's short, sweet, and full of interesting—and dare I say provocative—ideas. It's the best way to access my latest thinking and share your own thoughts. The next edition will be out on Tuesday, and you can receive it by subscribing below.

Want to see the goods before signing up? No problem! Here are a few of the recent editions: