Snowflake Create Warehouse Defaults

I have been working with the Snowflake Data Cloud since it was just an Analytical RDBMS. Since the beginning of 2018, Snowflake has been pretty fun to work with as a data professional and data entrepreneur. It allows data professionals amazing flexible data processing power in the cloud. The key to a successful Snowflake deployment is setting up security and account optimizations correctly from the beginning. In this article, we will discuss the CREATE WAREHOUSE default settings.

Snowflake Cost and Workload Optimization is the Key

After analyzing and working on hundreds of Snowflake customer accounts we have found key processes to optimize Snowflake for computing and storage costs.  The best way to have a successful Snowflake deployment is to make sure you set up the computer for cost and workload optimization.

The Snowflake default “create warehouse” settings are NOT optimized to limit costs. That is part of the reason we built our Snoptimizer service (Snowflake Cost Optimization Service) to help you automatically and easily optimize your Snowflake Account(s). There is no other way to make continuous query and cost optimizations so your Snowflake Cloud Data solution can run as efficiently. possible. Let’s take a quick look at how the Snowflake Account default settings are set right now for that brand-spanking-new Snowflake Account.

Here is the default screen that comes up when I click +Warehouse in the Classic Console.

Snowflake Classic Console - Create Warehouse Default Settings
Create Warehouse-Default Options for the Classic Console

Okay, for those of you already in Snowsight (aka Preview App), here is the default screen within Snowsight (or Preview App) – It is almost the same.

Snowflake Create Warehouse Defaults on Snowsight
Create Warehouse Default Options for Snowsight

So let’s dig into the default settings for these Web UIs that will be there if you just choose a name and click “Create Warehouse” – Let’s further evaluate what happens with our Snowflake Compute if you leave the default Create Warehouse settings.

Create Warehouse – Default Setting #1

Size (Really Warehouse of Compute Size): X-Large is set.  I’m going to assume you know how Snowflake Compute works and understand the Snowflake Warehouse T-Shirt Sizes. Notice that the default setting is X-Large Warehouse vs Smaller Warehouse settings of (XS, S, M, L) T-shirt default setting. This defaults to the same setting for both the Classic Console and Snowsight (the Preview App).

Create Warehouse – Default Setting #2 [assuming an Enterprise or higher Edition]

Maximum Clusters: 2

While this default setting makes sense if you want to have clustering enabled it still has serious cost implications by default. It assumes that the data cloud customer wants to launch a 2nd cluster and pay more for it on this Snowflake warehouse if it has a certain level of queued statements on the warehouse. If you stick with the XL settings – duplicating a cluster has serious cost consequences of $X/hr.

This is only the default setting for the Classic Console.  It also is ONLY set if you have Enterprise Edition or higher because the Standard Edition does not offer Clustering.

Create Warehouse – Default Setting #3 [assuming an Enterprise or higher Edition]

Minimum Clusters: 1

This is only the default setting for the Classic Console.

Create Warehouse – Default Setting #4 [assuming an Enterprise or higher Edition]

Scaling Policy: Standard
This setting is hard to rate but the truth is if you are a cost-conscious customer you would want to change this to “Economy” by default and not have it set as “Standard”. The optimal level though is that your 2nd cluster which is set by Default will kick in as soon as Queuing happens on your Snowflake warehouse versus not launching a 2nd cluster until Snowflake thinks that it has a minimum of 6 minutes of work that 2nd cluster would have to perform.

This is only the default setting for the Classic Console but when you do toggle the “Multi-cluster Warehouse” on Snowsight setting this does Default to “Standard” vs. defaulting to “Economy”.

Create Warehouse – Default Setting #5

Auto Suspend: 10 minutes
For me, I find this typically too high of a setting by default.  Many warehouses, especially ELT/ETL warehouses do not need this high of a default setting.  For example, a loading warehouse that may run at regular intervals never needs a cache and setting this high. Our Snoptimizer service finds inefficient and potentially costly settings like this.  For a load warehouse use case, Snoptimizer immediately saves you 599 seconds in computing for every interval this would run. We talk more about it in this Snowflake Warehouse Best Practice Auto Suspend article but this can add up, especially if your Load Warehouses is larger in T-shirt size.

NOTE: This defaults to the same setting for both the Classic Console and Snowsight (the Preview App)

Snowflake Create Warehouse – Default Setting #6

Auto Resume Checkbox: Checked by Default.
This setting is totally fine. I don’t even remember the last time I created a warehouse not set to “Auto Resume” is checked by default. This is one of the very very awesome things about Snowflake is that in milliseconds or seconds once a query is executed it brings that automated warehouse compute to your needs. This is revolutionary and awesome!

NOTE: This defaults to the same setting for both the Classic Console and Snowsight (the Preview App)

Snowflake Create Warehouse – Default Setting #7

Click “Create Warehouse”: The Snowflake Warehouse is immediately started.
This setting I do not like this.  I do not think it should immediately start to consume credits and go into the Running state.  It is too easy for a new SYSADMIN to start a warehouse they do not need.  The default setting before this is already set to “Resume”.  The Snowflake Warehouse will already resume when a job is sent to it so there is no need to automatically start.

NOTE: This defaults to the same execution for both the Classic Console and Snowsight (the Preview App)

What do you think?

We hope this was useful for you.  The ITS Snowflake Solutions is a community and data professional dedicated to Snowflake education and solutions.  We are here to help you run your Snowflake Account as efficiently as possible.  We work to solve your data-driven and data-automation challenges! 

 

One last thing…

As an extra bonus, check the code below in SQL code for those of you who just do not do “GUI”

Let’s go to the Snowflake CREATE WAREHOUSE code to see what is happening…

DEFAULT SETTINGS: 

CREATE WAREHOUSE XLARGE_BY_DEFAULT WITH WAREHOUSE_SIZE = ‘XLARGE’ WAREHOUSE_TYPE = ‘STANDARD’ AUTO_SUSPEND = 600 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = ‘STANDARD’ COMMENT = ‘This sucker will consume a lot of credits fast’;

A New Era of Cloud Analytics

A NEW ERA OF CLOUD ANALYTICS WITH SNOWFLAKE AS THE HADOOP ERA ENDS

Hadoop was regarded as a revolutionary technology that would change data management and completely replace data warehousing. Such a statement is partly accurate but not entirely true since it has not been the case ever since cloud solutions came into the picture. Hadoop mostly flourishes with projects that involve substantial data infrastructure, meaning it was more relevant around a decade ago when most data analysts Continue reading

Snowflake vs Netezza

Snowflake vs Netezza

Fifteen years ago, IBM introduced an appliance-based, on-prem analytics solution known as Netezza. It was purpose built, load ready, and met a lot of the needs of the day (back when on-prem was still largely the preferred choice for data warehousing solutions). One could say IBM really hit the ball out of the park, and Netezza has definitely enjoyed a good, solid run since then, but that was fifteen years ago, and times have Continue reading

ETL vs ELT: Data Warehouses Evolved

ETL vs ELT: Data Warehouses Evolved

For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Continue reading

Snowflake vs Redshift

Snowflake vs Redshift

We have been building data systems for years and this is the most excited we’ve been in years with all new capabilities within the cloud with Redshift, Google Big Query, and Snowflake. Today we wanted to share with you some results based on our estimating a relatively small 2TB cloud data warehouse on Snowflake and on Redshift for a client. Then we also wanted to go through all the differences we see. Continue reading

Snowflake vs Teradata

Snowflake vs Teradata

To anyone with even a passing level of familiarity of this space, Teradata is quite rightly known as a powerhouse in the data warehousing and analytics arena. it’s been the go-to technology for sectors ranging from various 3 letter intelligence agencies, to the most recognizable of medicine, science, auto, and telecom industry players combined.

We have been building data systems for years, including those incorporating Teradata (such as for a Data Warehousing project we executed for TicketMaster back in 2007), so this really is the most excited we’ve been in all that time, given all the new capabilities available now, as compared to then. Today we wanted to share with you some findings based on several industry-leading reports concerning cloud data warehousing and analytics in 2017/18, especially as it concerns Snowflake when compared to such an industry giant such as Teradata. Continue reading

Query Caching in Snowflake

Query Caching in Snowflake

Have you ever experienced slow query response times while waiting for a report that’s being viewed by multiple users and/or teams within your organization simultaneously?

This is a common issue in today’s data driven world; it’s called concurrency and it’s frustrating, usually delaying productivity just when the data being requested is needed the most. Well, here’s an incredible time saver you may not have yet heard about: Continue reading

The Power of Instantaneous Data Sharing

The Power of Instantaneous Data Sharing – Updated

How awesome would it be to be able to share data more quickly instead of exporting it to some format like Excel and then emailing it out? I’m always looking for new ways to make sharing data faster and more easy. When I think back the past 20-30 years in tech I think of all sorts of data sharing tools and evolution of data. Do you remember VSAM files? Lotus Notes? SharePoint? Dropbox?

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

Frank Bell
July 27, 2018

(Continued… from https://www.linkedin.com/pulse/power-instantaneous-data-sharing-frank-bell/)

Over the past 20-30 years there have been tons and tons of investments made in BOTH people and technology in order to share data more effectively and quickly. We currently have millions of data analysts, data scientists, data engineers, data this and data that all over the world. Data is growing and growing and a huge part of our economy and our growth as a society. At the same time though the tools to share it never really were maturing that much until recently with Snowflake’s Data Sharing Functionality.

Before I explain how transformative this new “data sharing” or “logical data access” functionality is let’s take a step back and explain how “data sharing” worked before this.

Brief Tech History of Data Sharing. Here are some of the old and semi-new tools:

Good old fashioned physical media. (floppy disks, 3.5 inch disks, hard drives, USB drives, etc.)

Email. Probably still the best for smaller amounts of data and files. I’ve done it too. I need some super fast way to move a excel file with data from 1 computer to another fast. Email to the rescue.

SFTP/FTP. Secure File Transfer Protocol. File Transfer Protocol.

EDI (yuck) – Electronic Data Interchange – The business side of me has hives just thinking about expensive and crappy of a business solution this is. Companies spent millions creating EDI exchanges. The is a cumbersome and expensive process but at the time it was the accepted way to exchange data.

SCP. Secure Copy Protocol. Great command line tool for technical users.

APIs (Application Programming Interfaces). While APIs have been amazing and come a long way there still is technical friction with sharing data through them.

Dropbox, etc. Dropbox revolutionized the ease of sharing files mainly. It’s still not really great for true data sharing.

Airdrop type functionality.

Let’s face it though, most of these are primitive and have a lot of friction especially for non-technical users. Even when they are slick like Airdrop they typically don’t work across platforms and are often limited in data sizes and to discrete files. All of these solutions above have a lot of limitations when you think of the friction to get quality “data” and “information” for analysis and use from one place to another its still relatively painful.

Enter Snowflake’s data sharing. With Snowflake they have created a concept of “data sharing” through a “data share” which makes larger structured and unstructured data sharing a lot easier and one of the biggest improvements is there is only ONE SOURCE OF DATA. Let me say it again, yes, that’s ONE SOURCE OF DATA. This isn’t your typical copying of data which creates all sorts of problems with data integrity and data governance. It’s the same consistent data shared throughout your departments, organizations, or with customers.

The main point here is that there is true power in effective and fast data sharing. If you can make decisions faster than your competitors or you can help out your constituents with faster service than it makes your organization much better overall.

[/fusion_text][fusion_text columns=”” column_min_width=”” column_spacing=”” rule_style=”default” rule_size=”” rule_color=”” class=”” id=””]

Also, it’s just easy to do. With a very simple command you can share data to any other snowflake account. The only real catch is you do need a Snowflake account but this account you are only charged for what you use. For example, if you have a personal account that you don’t use very often then you are not charged anything per month except $40/TB of storage but if you don’t store anything you are not charged for that either and then the only charge would be compute (queries of someone else’s data share) which would be pretty inexpensive. For organizations with Big Data this cost is very reasonable compared to all the legacy solutions that were required in the past that are slower, more cumbersome, and more expensive.

What challenges does this solve today?

Cross Enterprise Sharing. (Let’s say you need to compare how different brands across websites are performing? Or you need to compare financials. You can easily share this data now with integrity across the enterprise and rollup and integrate different business units data as necessary.

Partner/Extranet Type Data Sharing. You can share data with much more speed and integrity with your partners with much less complexity than APIs require.

Data Provider Sharing. Data Providers that need to share data can reduce costs and friction by more easily sharing their data at the row level to different customers.

As things get more and more complex. (I mean is there really any corporation saving less data this year than last?) then we need to challenge ourselves to make things more simple. That is what Snowflake has done. I encourage you to take a look for yourself and try it out for free. We will be sending out some Data Sharing examples as well in the next few weeks so stay tuned.

Also, if you don’t believe me then look at all the reference case studies coming out in the last few months. Data Sharing has the power to transform companies, partners, and industries. Make sure you at least investigate it to make sure you are not left behind.

Here is a Data Sharing for Dummies Video for more information on the technology.

[/fusion_text][fusion_text columns=”” column_min_width=”” column_spacing=”” rule_style=”default” rule_size=”” rule_color=”” class=”” id=””]

Reference Case Studies:

Playfab

Localytics

SpringServe

Strava