Snowflake Create Warehouse Defaults

Overview:

 

I have been working with the Snowflake Data Cloud since it was just an Analytical RDBMS. Since the beginning of 2018, Snowflake has been pretty fun to work with as a data professional and data entrepreneur. It allows data professionals amazing flexible data processing power in the cloud. The key to a successful Snowflake deployment is setting up security and account optimizations correctly from the beginning. In this article, we will discuss the ‘CREATE WAREHOUSE’ default settings.

 

Snowflake Cost and Workload Optimization is the Key

 

After analyzing hundreds of Snowflake customer accounts, we found key processes to optimize Snowflake for computing and storage costs. The best way to successfully deploy Snowflake is to ensure you set it up for cost and workload optimization.

The Snowflake default “create warehouse” settings are not optimized to limit costs. That is why we built our Snoptimizer service (Snowflake Cost Optimization Service) to automatically and easily optimize your Snowflake Account(s). There is no other way to continuously optimize queries and costs so your Snowflake Cloud Data solution runs as efficiently as possible.

Let’s quickly review how Snowflake Accounts’ default settings are currently set.

Here is the default screen that comes up when I click +Warehouse in the Classic Console.

 

https://snowflakesolutions.net/wp-content/uploads/Snowflake-Create-Warehouse-Default-Options-Classic-Console-1024x640.jpg

Create Warehouse-Default Options for the Classic Console

Okay, for those already in Snowsight (aka Preview App), here is the default screen within Snowsight (or Preview App) – It is nearly identical.

 

https://snowflakesolutions.net/wp-content/uploads/Snowflake-Create-Warehouse-Default-Options-Snowsight-1-1024x640.jpg

Create Warehouse Default Options for Snowsight

So let’s dig into the default settings for these Web UIs that will be there if you just choose a name and click “Create Warehouse” – Let’s further evaluate what happens with our Snowflake Compute if you leave the default Create Warehouse settings.

These default settings will establish the initial configuration for our Snowflake Compute. By understanding the defaults, we can determine if any changes are needed to optimize performance, security, cost, or other factors that are important for our specific use case. The defaults are designed to work out of the box for most general-purpose workloads but rarely meet every need.

 

Create Warehouse – Default Setting #1

 

Size (Really Warehouse of Compute Size): X-Large is set. I assume you understand how Snowflake Compute works and know the Snowflake Warehouse T-Shirt Sizes. Notice that the default setting is X-Large Warehouse vs smaller Warehouse settings of (XS, S, M, L) T-shirt default setting. This defaults to the same setting for both the Classic Console and Snowsight (the Preview App).

 

Create Warehouse – Default Setting #2

 

Maximum Clusters: 2

While enabling clustering by default makes sense if you want it enabled, it still has significant cost implications. It assumes the data cloud customer wants to launch a second cluster and pay more for it on this Snowflake warehouse if it has a certain level of queued statements. Sticking with the XL settings – duplicating a cluster has serious cost consequences of $X/hr.

This setting only applies to the Classic Console. It also is only set if you have Enterprise Edition or higher since Standard Edition does not offer Clustering.

 

Create Warehouse – Default Setting #3

 

Minimum Clusters: 1

This is only the default setting for the Classic Console.

 

Create Warehouse – Default Setting #4

 

Scaling Policy: Standard This setting is hard to rate but the truth is if you are a cost-conscious customer you would want to change this to “Economy” by default and not have it set as “Standard”. The optimal level though is that your 2nd cluster which is set by Default will kick in as soon as Queuing happens on your Snowflake warehouse versus not launching a 2nd cluster until Snowflake thinks that it has a minimum of 6 minutes of work that 2nd cluster would have to perform.

This is only the default setting for the Classic Console but when you toggle the “Multi-cluster Warehouse” on Snowsight setting this defaults to “Standard” vs. defaulting to “Economy”.

 

Create Warehouse – Default Setting #5

 

Auto Suspend: 10 minutes For many warehouses, especially ELT/ETL warehouses, this default setting is typically too high. Loading warehouses that run on regular intervals rarely need such a high cache setting. For example, a loading warehouse that runs on a schedule never needs extensive caching. Our Snoptimizer service finds inefficient and potentially costly settings like this.

For a loading warehouse, Snoptimizer immediately saves 599 seconds of computing time for every interval. As discussed in the Snowflake Warehouse Best Practice Auto Suspend article, this can significantly reduce costs, especially for larger load warehouses.

We talk more about optimizing warehouse settings in this article but reducing this setting can substantially lower expenses with no impact on performance.

NOTE: This defaults to the same setting for both the Classic Console and Snowsight (the Preview App).

 

Snowflake Create Warehouse – Default Setting #6

 

Auto Resume Checkbox: Checked by Default. This setting is fine as is. I do not recall the last time I created a warehouse without “Auto Resume” checked by default. Snowflake’s ability to resume a query in milliseconds or seconds once executed brings automated warehouse computing to user needs. This is revolutionary and useful!

NOTE: This defaults to the same for both the Classic Console and Snowsight (the Preview App).

 

Snowflake Create Warehouse – Default Setting #7

 

Click “Create Warehouse”: The Snowflake Warehouse is immediately started. This setting I do not prefer. I do not think it should immediately start to consume credits and go into the Running state. It is too easy for a new SYSADMIN to start a warehouse they do not need. The default setting before this is already set to “Resume”. The Snowflake Warehouse will already resume when a job is sent to it so there is no need to automatically start.

NOTE: This defaults to the same execution for both the Classic Console and Snowsight (the Preview App).

 

One last thing…

 

As an extra bonus, check the code below in SQL code for those of you who just do not do “GUI”.

Let’s go to the Snowflake CREATE WAREHOUSE code to see what is happening…

DEFAULT SETTINGS:

CREATE WAREHOUSE XLARGE_BY_DEFAULT WITH WAREHOUSE_SIZE = ‘XLARGE’ WAREHOUSE_TYPE = ‘STANDARD’ AUTO_SUSPEND = 600 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = ‘STANDARD’ COMMENT = ‘This sucker will consume a lot of credits fast’;

 

Conclusion:

 

Snowflake default warehouse settings are not optimized for cost and workload. The default settings establish an X-Large warehouse, allow up to 2 clusters which increases costs, use a “Standard” scaling policy and 10-minute auto-suspend, and immediately start the warehouse upon creation. These defaults work for general use but rarely meet specific needs. Optimizing settings can significantly reduce costs with no impact on performance.

A New Era of Cloud Analytics

A NEW ERA OF CLOUD ANALYTICS WITH SNOWFLAKE AS THE HADOOP ERA ENDS

Hadoop was regarded as a revolutionary technology that would change data management and completely replace data warehousing. Such a statement is partly accurate but not entirely true since it has not been the case ever since cloud solutions came into the picture. Hadoop mostly flourishes with projects that involve substantial data infrastructure, meaning it was more relevant around a decade ago when most data analysts Continue reading

Snowflake vs Netezza

Snowflake vs Netezza

Fifteen years ago, IBM introduced an appliance-based, on-prem analytics solution known as Netezza. It was purpose built, load ready, and met a lot of the needs of the day (back when on-prem was still largely the preferred choice for data warehousing solutions). One could say IBM really hit the ball out of the park, and Netezza has definitely enjoyed a good, solid run since then, but that was fifteen years ago, and times have Continue reading

ETL vs ELT: Data Warehouses Evolved

ETL vs ELT: Data Warehouses Evolved

For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Continue reading

Snowflake vs Redshift

Intro – Snowflake vs. Redshift

 

When it comes to cloud data warehousing solutions, Snowflake and Redshift are two of the most popular options. Both solutions offer fast and efficient data processing, but there are some key differences to consider that can help determine which one is the better fit for your organization.

In this article, we’ll cover each one in-depth so you can decide what is the best option for you and your organization.

 

Snowflake:

 

Snowflake is known for its ease of use and scalability, making it an attractive option for organizations that need a solution that is simple to set up and manage. Its cloud-native architecture allows it to automatically scale up or down as needed, which helps ensure that performance remains consistent, regardless of data size. Additionally, Snowflake’s user-friendly interface and support for ANSI SQL make it easy for teams to get started with the platform quickly.

 

 

Snowflake Benefits:

 

There are several benefits to using Snowflake Data Cloud, but the three most important ones are:

1. Ease of use and scalability

2. Cloud-native architecture that automatically scales up or down as needed

3. User-friendly interface and support for ANSI SQL

These features make Snowflake an attractive option for organizations that need a solution that is simple to set up and manage, with consistent performance, regardless of data size. Additionally, Snowflake’s user-friendly interface and support for ANSI SQL make it easy for teams to get started with the platform quickly.

 

Redshift:

 

On the other hand, Redshift is a popular choice for organizations that are already using Amazon Web Services (AWS). Redshift integrates seamlessly with other AWS services, which can make it easier to manage your entire AWS infrastructure from a single console. Additionally, Redshift’s pricing model can be more cost-effective than Snowflake’s, depending on the size of your data warehousing needs.

 

Redshift Benefits:

 

There are several benefits to using Snowflake Data Cloud, but the three most important ones are:

1. Seamless integration with other AWS services.

2. Potentially more cost-effective pricing, depending on data warehousing needs.

3. High performance for complex queries and large datasets.

 

Conclusion:

 

It’s important to carefully evaluate the needs and priorities of your organization when choosing between Snowflake and Redshift. Consider factors such as ease of use, scalability, compatibility with other services, pricing, and any specific requirements your organization may have. By testing both solutions and comparing their features, you can choose the data warehousing solution that best meets your needs.

Ultimately, the choice between Snowflake and Redshift will depend on the specific needs of your organization, but with careful consideration and evaluation of each solution, you can make an informed decision that will benefit your organization in the long run.

Snowflake vs Teradata

Snowflake vs Teradata

To anyone with even a passing level of familiarity of this space, Teradata is quite rightly known as a powerhouse in the data warehousing and analytics arena. it’s been the go-to technology for sectors ranging from various 3 letter intelligence agencies, to the most recognizable of medicine, science, auto, and telecom industry players combined.

We have been building data systems for years, including those incorporating Teradata (such as for a Data Warehousing project we executed for TicketMaster back in 2007), so this really is the most excited we’ve been in all that time, given all the new capabilities available now, as compared to then. Today we wanted to share with you some findings based on several industry-leading reports concerning cloud data warehousing and analytics in 2017/18, especially as it concerns Snowflake when compared to such an industry giant such as Teradata. Continue reading

Query Caching in Snowflake

Query Caching in Snowflake

Have you ever experienced slow query response times while waiting for a report that’s being viewed by multiple users and/or teams within your organization simultaneously?

This is a common issue in today’s data driven world; it’s called concurrency and it’s frustrating, usually delaying productivity just when the data being requested is needed the most. Well, here’s an incredible time saver you may not have yet heard about: Continue reading

The Power of Instantaneous Data Sharing

The Power of Instantaneous Data Sharing – Updated

How awesome would it be to be able to share data more quickly instead of exporting it to some format like Excel and then emailing it out? I’m always looking for new ways to make sharing data faster and more easy. When I think back the past 20-30 years in tech I think of all sorts of data sharing tools and evolution of data. Do you remember VSAM files? Lotus Notes? SharePoint? Dropbox?

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

Frank Bell
July 27, 2018

(Continued… from https://www.linkedin.com/pulse/power-instantaneous-data-sharing-frank-bell/)

Over the past 20-30 years there have been tons and tons of investments made in BOTH people and technology in order to share data more effectively and quickly. We currently have millions of data analysts, data scientists, data engineers, data this and data that all over the world. Data is growing and growing and a huge part of our economy and our growth as a society. At the same time though the tools to share it never really were maturing that much until recently with Snowflake’s Data Sharing Functionality.

Before I explain how transformative this new “data sharing” or “logical data access” functionality is let’s take a step back and explain how “data sharing” worked before this.

Brief Tech History of Data Sharing. Here are some of the old and semi-new tools:

Good old fashioned physical media. (floppy disks, 3.5 inch disks, hard drives, USB drives, etc.)

Email. Probably still the best for smaller amounts of data and files. I’ve done it too. I need some super fast way to move a excel file with data from 1 computer to another fast. Email to the rescue.

SFTP/FTP. Secure File Transfer Protocol. File Transfer Protocol.

EDI (yuck) – Electronic Data Interchange – The business side of me has hives just thinking about expensive and crappy of a business solution this is. Companies spent millions creating EDI exchanges. The is a cumbersome and expensive process but at the time it was the accepted way to exchange data.

SCP. Secure Copy Protocol. Great command line tool for technical users.

APIs (Application Programming Interfaces). While APIs have been amazing and come a long way there still is technical friction with sharing data through them.

Dropbox, etc. Dropbox revolutionized the ease of sharing files mainly. It’s still not really great for true data sharing.

Airdrop type functionality.

Let’s face it though, most of these are primitive and have a lot of friction especially for non-technical users. Even when they are slick like Airdrop they typically don’t work across platforms and are often limited in data sizes and to discrete files. All of these solutions above have a lot of limitations when you think of the friction to get quality “data” and “information” for analysis and use from one place to another its still relatively painful.

Enter Snowflake’s data sharing. With Snowflake they have created a concept of “data sharing” through a “data share” which makes larger structured and unstructured data sharing a lot easier and one of the biggest improvements is there is only ONE SOURCE OF DATA. Let me say it again, yes, that’s ONE SOURCE OF DATA. This isn’t your typical copying of data which creates all sorts of problems with data integrity and data governance. It’s the same consistent data shared throughout your departments, organizations, or with customers.

The main point here is that there is true power in effective and fast data sharing. If you can make decisions faster than your competitors or you can help out your constituents with faster service than it makes your organization much better overall.

[/fusion_text][fusion_text columns=”” column_min_width=”” column_spacing=”” rule_style=”default” rule_size=”” rule_color=”” class=”” id=””]

Also, it’s just easy to do. With a very simple command you can share data to any other snowflake account. The only real catch is you do need a Snowflake account but this account you are only charged for what you use. For example, if you have a personal account that you don’t use very often then you are not charged anything per month except $40/TB of storage but if you don’t store anything you are not charged for that either and then the only charge would be compute (queries of someone else’s data share) which would be pretty inexpensive. For organizations with Big Data this cost is very reasonable compared to all the legacy solutions that were required in the past that are slower, more cumbersome, and more expensive.

What challenges does this solve today?

Cross Enterprise Sharing. (Let’s say you need to compare how different brands across websites are performing? Or you need to compare financials. You can easily share this data now with integrity across the enterprise and rollup and integrate different business units data as necessary.

Partner/Extranet Type Data Sharing. You can share data with much more speed and integrity with your partners with much less complexity than APIs require.

Data Provider Sharing. Data Providers that need to share data can reduce costs and friction by more easily sharing their data at the row level to different customers.

As things get more and more complex. (I mean is there really any corporation saving less data this year than last?) then we need to challenge ourselves to make things more simple. That is what Snowflake has done. I encourage you to take a look for yourself and try it out for free. We will be sending out some Data Sharing examples as well in the next few weeks so stay tuned.

Also, if you don’t believe me then look at all the reference case studies coming out in the last few months. Data Sharing has the power to transform companies, partners, and industries. Make sure you at least investigate it to make sure you are not left behind.

Here is a Data Sharing for Dummies Video for more information on the technology.

[/fusion_text][fusion_text columns=”” column_min_width=”” column_spacing=”” rule_style=”default” rule_size=”” rule_color=”” class=”” id=””]

Reference Case Studies:

Playfab

Localytics

SpringServe

Strava