Snowflake vs Redshift
We have been building data systems for years and this is the most excited we’ve been in years with all new capabilities within the cloud with Redshift, Google Big Query, and Snowflake. Today we wanted to share with you some results based on our estimating a relatively small 2TB cloud data warehouse on Snowflake and on Redshift for a client. Then we also wanted to go through all the differences we see.
Since we’re starting with a minimum storage requirement of 2TB, using SSD, there’s really only one server option available with AWS, and that’s their ‘DC2.8XLarge’ type node, providing a whopping 32 vCPUs, which is the equivalent of an Amazon-adjusted computing quotient of 99 ECUs (see https://aws.amazon.com/ec2/faqs/#hardware-information), an astounding 244 GB of RAM, 2.56 TB SSD, and a blistering 7.5 GB/sec I/O. When provisioning this node using Amazon’s On-Demand pricing, it’ll set you back a cool $4.80/Hour.
Other payment options include those of their Standard 1 Year Term:
Payment Option | Upfront | Monthly | Effective Hourly | Savings Over On-Demand | On-Demand Hourly |
---|---|---|---|---|---|
No Upfront | $0.00 | $2,774.00 | $3.80 | 21% | $4.80 |
Partial Upfront | $12,000.00 | $1,350.50 | $3.22 | 33% | $4.80 |
All Upfront | $27,640.00 | $0.00 | $3.16 | 34% | $4.80 |
… as well as their Standard 3 Year Term:
Payment Option | Upfront | Monthly | Effective Hourly | Savings Over On-Demand | On-Demand Hourly |
---|---|---|---|---|---|
Partial Upfront | $21,200.00 | $584.00 | $1.61 | 67% | $4.80 |
All Upfront | $39,470.00 | $0.00 | $1.50 | 69% | $4.80 |
So clearly the most cost effective option – for Redshift – is the latter, but let’s consider another excellent option here: Snowflake. How does Snowflake differ?
Feature | Snowflake | Redshift |
---|---|---|
Compute and storage | Decoupled for scaling and configurable | Both pre-configured on each node |
Distribution of components |
Components can be distributed across zones for scale and disaster recovery | Compute and storage must always be in the same availability zone |
On demand usage | On-demand use of resources | Has to be running at all times |
Scale limits | Ranges from 1KB of data to multi-PBs for data warehouse and relational use | Handles large scale data warehouses, but overkill for anything else |
Storage cost | Low cost storage decoupled from on-demand compute together reducing costs | Inseparable from Compute |
Compute cost | Scales independently using on-demand VMs | Inseparable from Storage |
Cluster migrations during scale of data |
Not required. Automatically scales | Requires full cluster migration because of tightly coupled storage and compute |
Backups | “Uses Time Travel to provide snapshot for up to 90-days. No extra monitoring.” |
Full data copy and restore to S3. Creates additional costs and requires monitoring. |
Disaster recovery | Inherently distributed across data centers and zones already | Explicit actions required for distribution and disaster recovery |
Structured storage (relational data) |
Native | No SQL columnar |
Semi-structured storage (JSON, XML, AVRO) |
Native | Use it sparingly; requires additional services to do transforms first (ex: use AWS EMR) |
JSON storage | JSON data is stored as columns for fast processing (no extra preparation required) |
JSON breaks performance benefits. AWS recommends using it sparingly. |
Key management | Automatic | Explicit action required to distribute keys for effective performance |
Data vacuuming | Automatic | Explicit action required after loading data |
Column compression | Automatic | Requires testing and explicit management |
Scaling with data | Automatic | Must provision additional nodes and move data (can take hours or days) |
Administrative overhead | Zero administration (DBA not required) | Requires DBA activity similar to on-premises cluster administration |
Code upgrades | Code upgrades are invisible to users | Code upgrades are weekly and require downtime |
Cost model | Consumption model – pay for compute used; storage paid at average for capacity per month |
Pay per-hour for entire node. If node is switched off, it has to be turned on explicitly and data loaded |
Ad hoc queries | Very fast and adaptable | Optimized for data warehouse queries |
Concurrent usage | Multiple concurrent users can query the system without issue. Separate compute clusters can be used. | Concurrency problems can arise if queries conflict with each other. Managing query schedule is important to reduce overhead. |
Test development copy | Requires zero additional space and runs on separate compute | Requires a full data copy which is costly especially as the size gets larger. Also, requires DB to be unavailable during copy time. |
As you can see, while there are many features that represent a massive cost savings with Snowflake over Redshift, probably none of them stand out as much as the first one – that of having storage and compute decoupled. With an on-demand price of just $40/TB per Month, or a pre-paid price of only $23/TB per Month, the savings, over Redshift, stack into the thousands per month on storage costs alone. Compute is scaled completely automatically as well, so the power is there, without wasting it when it’s not needed. The difference in expenditure required is simply astounding!
Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!