The need to move data from point A to point B is as old as computers themselves, which is what ETLs solve. Don’t worry if you’re not as familiar with this acronym or don’t know the difference between ETL vs ETL. This post is your crash course into this world.
What’s in a Name? ETL Deconstructed
While I don’t think Shakespeare ever wrote about ETLs, I was reminded about this classic line when writing this post. Names matter, and our acronym, ETL, has everything we need to understand it’s purpose.
ETL stands for Extract, Load, and Transform. These are three steps in the data preparation process. In most scenarios, we need to organize our data before we can analyze it.
Extract is where we take the data from point A, the source. The source could be a website, an app, a database, or really anything where data is stored.
Transform is when we clean up the data to make it useful for analysis. We could add new columns (or dimensions), rename existing ones, or any other changes.
Load is the final step, where we save the information to point B. By this point, the data has been clean and is ready to be visualized.
Companies have followed this ETL process for years. You always need to move to data, and in recent times, the challenge has been in how you transfer lots of it.
ETLs are naturally technical topics, but non-technical executives should also understand the basics of this process. Your CTO will present you with a bill for an ETL tool, and it’s helpful to know why your company is even paying for this.
In practice, this process isn’t as simple as I described. The sources for extraction may be complicated or unreliable. The transformation may be so complex that it would take too long to do it. The loading may be the easiest part, but there might be nothing to load without the first steps.
I had a client who wanted to move data out of Salesforce and Pardot and their data warehouse. However, their ETL supported neither of these options, and the process of integration was too time-consuming. They had to rely on manual exports, which weren’t ideal. In this example, the extraction portion wasn’t straightforward.
How Did Things Get Flipped into ELT?
You may have also heard about ELTs. Notice that L and T switched spots.
The steps are the same, but you will load the data before transforming it into this approach. You may not even need to convert the data and simply let users filter through it using something like SQL.
This approach is viable now because cloud computing costs have gone down over the past few years. You can now easily replicate and scale up your storage with companies like Amazon, Google, and Microsoft in a few clicks.
You no longer need to worry about storing too much data. Just store it all and then find a way to visualize it. The downside of this approach is that you could end up stuck with a useless pile of data. The pile could be huge but useless nonetheless.
My Recommendation for When to Use ELT vs ETL
In my experience, there are specific situations where each approach would work.
ETLs work best when dealing with large volumes of data that required cleaning to be useful. You can’t simply dump the data and expect users to find insights within it.
ELTs work best when the data structure is already defined, and you simply need to move it from point A to point B. The Salesforce example from before would be perfect for an ELT. The data will have have the right columns, and we just need to make it easily accessible by storing it in a data warehouse.
I also wanted to briefly talk about CDPs. These software tools have an element of ETL, but that’s not their only role. They also manage customer data schemas and protect data integrity. CDPs help marketing and product teams that need to move data between locations and almost always deal with well-defined data structures. ETLs and CDPs overlap in functionality, but you could easily use both.
Moving data is hard. Choosing the right approach (ELT vs ETL) can make it easier, but don’t forget that the goal here is insights. You may have the most sophisticated process for moving data around, but if people can’t use this data, it’s pointless. Choose an approach and move on to the more challenging things.