Snowflake vs Redshift

We have been building data systems for years and this is the most excited we’ve been in years with all new capabilities within the cloud with Redshift, Google Big Query, and Snowflake. Today we wanted to share with you some results based on our estimating a relatively small 2TB cloud data warehouse on Snowflake and on Redshift for a client. Then we also wanted to go through all the differences we see.

Since we’re starting with a minimum storage requirement of 2TB, using SSD, there’s really only one server option available with AWS, and that’s their ‘DC2.8XLarge’ type node, providing a whopping 32 vCPUs, which is the equivalent of an Amazon-adjusted computing quotient of 99 ECUs (see https://aws.amazon.com/ec2/faqs/#hardware-information), an astounding 244 GB of RAM, 2.56 TB SSD, and a blistering 7.5 GB/sec I/O. When provisioning this node using Amazon’s On-Demand pricing, it’ll set you back a cool $4.80/Hour.

Other payment options include those of their Standard 1 Year Term:

Payment Option Upfront Monthly Effective Hourly Savings Over On-Demand On-Demand Hourly
No Upfront $0.00 $2,774.00 $3.80 21% $4.80
Partial Upfront $12,000.00 $1,350.50 $3.22 33% $4.80
All Upfront $27,640.00 $0.00 $3.16 34% $4.80

… as well as their Standard 3 Year Term:

Payment Option Upfront Monthly Effective Hourly Savings Over On-Demand On-Demand Hourly
Partial Upfront $21,200.00 $584.00 $1.61 67% $4.80
All Upfront $39,470.00 $0.00 $1.50 69% $4.80

So clearly the most cost effective option – for Redshift – is the latter, but let’s consider another excellent option here: Snowflake. How does Snowflake differ?

Feature Snowflake Redshift
Compute and storage Decoupled for scaling and configurable Both pre-configured on each node
Distribution of
components
Components can be distributed across zones for scale and disaster recovery Compute and storage must always be in the same availability zone
On demand usage On-demand use of resources Has to be running at all times
Scale limits Ranges from 1KB of data to multi-PBs for data warehouse and relational use Handles large scale data warehouses, but overkill for anything else
Storage cost Low cost storage decoupled from on-demand compute together reducing costs Inseparable from Compute
Compute cost Scales independently using on-demand VMs Inseparable from Storage
Cluster migrations during
scale of data
Not required. Automatically scales Requires full cluster migration because of
tightly coupled storage and compute
Backups “Uses Time Travel to provide snapshot for up
to 90-days. No extra monitoring.”
Full data copy and restore to S3. Creates additional costs and requires monitoring.
Disaster recovery Inherently distributed across data centers and zones already Explicit actions required for distribution and disaster recovery
Structured storage
(relational data)
Native No SQL columnar
Semi-structured storage
(JSON, XML, AVRO)
Native Use it sparingly; requires additional services to do transforms first (ex: use AWS EMR)
JSON storage JSON data is stored as columns for fast
processing (no extra preparation required)
JSON breaks performance benefits. AWS
recommends using it sparingly.
Key management Automatic Explicit action required to distribute keys for effective performance
Data vacuuming Automatic Explicit action required after loading data
Column compression Automatic Requires testing and explicit management
Scaling with data Automatic Must provision additional nodes and move data (can take hours or days)
Administrative overhead Zero administration (DBA not required) Requires DBA activity similar to on-premises cluster administration
Code upgrades Code upgrades are invisible to users Code upgrades are weekly and require
downtime
Cost model Consumption model – pay for compute used;
storage paid at average for capacity per
month
Pay per-hour for entire node. If node is switched off, it has to be turned on explicitly and data loaded
Ad hoc queries Very fast and adaptable Optimized for data warehouse queries
Concurrent usage Multiple concurrent users can query the system without issue. Separate compute clusters can be used. Concurrency problems can arise if queries conflict with each other. Managing query schedule is important to
reduce overhead.
Test development copy Requires zero additional space and runs on separate compute Requires a full data copy which is costly especially as the size gets larger. Also, requires DB to be unavailable during copy time.

As you can see, while there are many features that represent a massive cost savings with Snowflake over Redshift, probably none of them stand out as much as the first one – that of having storage and compute decoupled. With an on-demand price of just $40/TB per Month, or a pre-paid price of only $23/TB per Month, the savings, over Redshift, stack into the thousands per month on storage costs alone. Compute is scaled completely automatically as well, so the power is there, without wasting it when it’s not needed. The difference in expenditure required is simply astounding!

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

Leave a Reply

Snowflake Cost Saving

we automate snowflakeDB data cloud cost saving. sign our free 7 days no risk trail now