ETL vs ELT: Data Warehouses Evolved
For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Extraction, Transformation, & Loading, or just ETL for short. The whole point of this being to provide businesses with clean, relevant data, suitable for analytics, but ETL has become outdated with the latest in data warehousing technology.
Besides being quite complex (in particular the transformation part), it usually requires at least several months for a small sized organization to ramp up and get going. If the migration is to be an ongoing, repeated influx of cleansed and transformed data, expect an equally non-stop series of updates to the ETL job as well, since data is always evolving with any business over time and will usually require adjustments to the job, accordingly.
The largest difference between these two approaches, and the reason why ETL has been around for so long to begin with, basically hinges on the processing capabilities data warehouse servers have historically been restricted to, as compared to the ground-up, built for the cloud powerhouse now available with the cloud data warehouse services of Snowflake. These new enhancements and capabilities include:
• ELT is Cloud Based. With infinite scalability, the on demand nature of cloud services, combined with Snowflake’s unique ability to separate compute from data, means you have all the performance you need, when you need it, and don’t have to pay for it when you don’t.
• Storage is Cheap! Super fast, S3 or Azure based storage comes for as little as $40/mo in the US, and $45/mo in the EU.
• Concurrency is not an issue anymore.Running a transformation into production data on a server with separate compute warehouses, such as Snowflake, means the data stays fresh and there’s no performance impact on any of the production users, even from their own activities parallel to other departments.
This last point is actually key because Snowflake was built to handle analytical operations on real time, one-version-of-the-truth, production data, without the typical slowdowns a given query can impose when people from different departments are working with the same data simultaneously, to say nothing for an incoming transformation job like with ELT. That separation of compute from storage really changes everything, and Snowflake takes the usual expenses for administration out of the picture as well, so the savings can be nothing short of spectacular.