ETL vs ELT: Data Warehouses Evolved

ETL vs ELT: Data Warehouses Evolved

For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Extraction, Transformation, & Loading, or just ETL for short. The whole point of this being to provide businesses with clean, relevant data, suitable for analytics, but ETL has become outdated with the latest in data warehousing technology.

Besides being quite complex (in particular the transformation part), it usually requires at least several months for a small sized organization to ramp up and get going. If the migration is to be an ongoing, repeated influx of cleansed and transformed data, expect an equally non-stop series of updates to the ETL job as well, since data is always evolving with any business over time and will usually require adjustments to the job, accordingly.

Also, your typical ETL data migration job in many ways reflects a particular view of what portion of said data is considered relevant at the time. This often changes, and data which had been excluded before will suddenly be found useful moving forward. This can often require even more weeks or months as an impact to the transformation, and then everything has to be loaded once again. With average database load times easily exceeding several hours or more, the freshness of the data is likely to be a week old at best.

A recap of some of the negatives with ETL:

• ETL takes a very long time and users have to wait weeks or months for the transformed data.
• ETL takes a very long time and users have to wait weeks or months for the transformed data.
• It’s costly to maintain, to say nothing for initial implementation for a small or mid sized business.
• ETL doesn’t provide access to raw data.

ETL vs. ELT: Executing Transformations within the Data Warehouse itself

The largest difference between these two approaches, and the reason why ETL has been around for so long to begin with, basically hinges on the processing capabilities data warehouse servers have historically been restricted to, as compared to the ground-up, built for the cloud powerhouse now available with the cloud data warehouse services of Snowflake. These new enhancements and capabilities include:

• ELT is Cloud Based. With infinite scalability, the on demand nature of cloud services, combined with Snowflake’s unique ability to separate compute from data, means you have all the performance you need, when you need it, and don’t have to pay for it when you don’t.

• Storage is Cheap! Super fast, S3 or Azure based storage comes for as little as $40/mo in the US, and $45/mo in the EU.

• Concurrency is not an issue anymore.Running a transformation into production data on a server with separate compute warehouses, such as Snowflake, means the data stays fresh and there’s no performance impact on any of the production users, even from their own activities parallel to other departments.

This last point is actually key because Snowflake was built to handle analytical operations on real time, one-version-of-the-truth, production data, without the typical slowdowns a given query can impose when people from different departments are working with the same data simultaneously, to say nothing for an incoming transformation job like with ELT. That separation of compute from storage really changes everything, and Snowflake takes the usual expenses for administration out of the picture as well, so the savings can be nothing short of spectacular.

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!