Snowflake Snowday – Data to Value Superhero Summary

Snowflake Snowday — Data to Value and Superhero Summary

Snowflake Snowday is Snowflake’s semi-annual product announcement. This year it was held on 2022–11–07 which was the same day as the end of the Snowflake Data Cloud World Tour (DCWT) which was a live event in San Francisco.

I was able to attend 5 of the DCWT events this year around the world. It was very interesting to see how much Snowflake has grown this world tour compared to the one back in 2019.  There is a ton of improvements and new features within the Snowflake Data Cloud happening.  It is hard to keep up!  Many of these announcements really do add improvements to the Data to Value business.  

Let’s get to the Snowday Summary and the plethora of Snowflake feature announcements.  Key improvements related to improvements in Data to Value that I’m most excited about are:

  • Snowpark for Python in GA
  • Private Data Listings – Massive improvement in the speed of data collaboration.
  • Snowflake Kafka Connector and Dynamic Tables.  Snowpipe Streaming.
  • Streamlit integration.

*All of these features add significant Data to Value improvements for organizations.

Snowflake Snowday Summary

*TOP announcement – whoop whoop – SNOWPARK FOR PYTHON ! (General Availability – GA)
I think this was the announcement all the python data people were looking forward to (including me). Now Snowpark for Python enables each and every Snowflake customer to build and deploy Python-based applications, pipelines, and machine learning models directly in Snowflake.  In addition to Snowpark for Python being Generally Available to all snowflake editions, these other python related announcements were made:
  • Snowpark Python UDFs for unstructured data (PRIVATE PREVIEW)
  • Python Worksheets – Now the improved Snowsight worksheet has support for python and you do not need an additonal development environment. This does make it easier to get started with Snowpark for Python development. (PRIVATE PREVIEW)

ONE PRODUCT.  ONE PLATFORM.

This is Snowflake’s major push to make it easier and easier for customers to use Snowflake’s platform for all or most of their Data Cloud needs.  This is why they now have taken on Hybrid Tables – Unistore (OLTP Workloads) as well as Snowpark.  They are growing the core Snowflake platform to handle AI/ML workloads as well os Online Transaction Processing (OLTP) workloads.  This massively increases Snowflake’s Total Addressable Market (TAM).

***This is also the main reason they purchased Streamlit earlier this year.  They are moving to integrate Streamlit Data Application Frontend and Backend and also take on the Data Applications use cases.  So Snowflake if investing a ton to go from primarily a Data Store to a Data Platform where you can create Frontend and Backend Data applications.  (as well as web/data applications that need OLTP millisecond inserts or AI/ML workloads)

Also, Snowflake just keeps improving the core Snowgrid Platform as follows:

Cross Cloud Snowgrid
Cross Cloud Snowgrid

Replication Improvements and Snowgrid Updates:

These are overall amazing Cross-Cloud Snowgrid improvements and features around the platform, performance, and replication.  If you are new to Snowflake, we answer What is Snowgrid here.

  • Cross-Cloud Business Continuity – Streams & Tasks Replication (PUBLIC PREVIEW) – This is very cool as well.  I need to test it but in theory this will provide … Seamless pipeline failover which is really aweome.  This takes replication beyond just accounts, databases, policies, and metadata.
  • Cross-Cloud Business Continuity – Replication GUI.  (PRIVATE PREVIEW).  Now you will be able to more easily manage replication and failover from a single user interface for global replication.  It looks very cool.  You can easily setup, manage, and failover an account.
  • Cross-Cloud Collaboration – Listing Discovery Controls (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
  • Cross-Cloud Governance – Tag-Based Masking (GA)
  • Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW).
  • Replication Groups – Looking forward to the latest on this as well.  These can be used for sharing and simple database replication in all editions

***All the above is available on all editions EXCEPT: 

  • YOU NEED ENTERPRISE OR HIGHER for Failover/Failback (including Failover Groups)
  • YOU NEED BUSINESS CRITICAL OR HIGHER for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

  • Query Acceleration (public preview).
  • Search Optimization Enhancements (public preview).
  • Join eliminations (GA).
  • Top results queries (GA).
  • Cost Optimizations: Account usage details (private preview).
  • History views (in development).
  • Programmatic query metrics (public preview).

***Available on all editions EXCEPT:  YOU NEED ENTERPRISE OR HIGHER for both Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m super excited about this announcement around Private Listings.  Many of you know that one of my favorite features of Snowflake is the Data Sharing which I have been writing about for over 4 years.  [My latest take is the Future of Data Collaboration] This is such a huge game-changer for Data Professionals.  This announcement is that now customers can more easily use listing for PRIVATE DATA SHARING scenarios.  It makes the fulfillment much easier as well for different regions.  (even 1-2 years ago we had to write replication commands) – I’ll write up more details about how this makes Data Sharing and Collaboration even easier.   I was delighted to see the presenters using the Data to Value concepts when presenting this.

I loved the way Snowflake used some of my Data to Value concepts around this Announcement including..  the benefit of:  “Time to value is significantly reduced for the consuming party”.  Even better, this functionality is available now for ALL SNOWFLAKE EDITIONS.

Private Listings
Private Listings

More and More Announcements on Snowday.

Snowflake has tons AND tons of improvements happening.  Other significant announcements on Snowday were:

Snowflake Data Governance IMPROVEMENTS

All of these features allow you to better protect and govern your data natively within Snowflake.
  • Tag-based Masking (GA) – This allows you to automatically assign a designated policy to sensitive columns using tags. Pretty nice.  (Generally Available)
  • Search Optimization will now have support for Tables with Masking and Row Access Policies (PRIVATE PREVIEW)
  • FedRAMP High for AWS Government (authorization in process)
***Available ONLY on ENTERPRISE+  OR HIGHER

Building ON Snowflake

New announcements related to:
  • Streamlit integration (PRIVATE PREVIEW in January 2023 – supposedly already oversubscribed?) – This is exciting to see. I cannot wait until the Private Preview.
  • Snowpark Optimization Warehouses (PUBLIC PREVIEW). This was a great move on Snowflake’s part to support what AI/ML Snowpark customers really needed. Great to see it get rolled out. This allows customers access to execute HIGHER MEMORY warehouses which can deal with ML/AI training scale better. Snowpark code can be executed on both warehouse types.

***Available for all Snowflake Editions

Finally – Streaming and Dynamic Tables ANNOUNCEMENTS:

  • Snowpipe Streaming – (PUBLIC PREVIEW SOON) –
  • Snowflake Kafka Connector – (PUBLIC PREVIEW SOON)
  • Snowflake Dynamic Tables – formerly Materialized Tables (PRIVATE PREVIEW) – Check out my fellow data superhero – Dan Galvin’s coverage here:  https://medium.com/snowflake/%EF%B8%8F-snowflake-in-a-nutshell-the-snowpipe-streaming-api-dynamic-tables-ae33567b42e8
***Available for all Snowflake Editions
Overall I”m pretty excited on where this is going.  These enhancements improve Streaming data integration so much more especially with Kafka.  Now you can as a Snowflake customer Ingest real-time data streams and transform data with low-latency.  When fully implemented then this will enable more cost-effective and performance-effective solutions around Data Lakes.

If you didn’t get enough Snowday and want to watch the recording then here is the link below:
https://www.snowflake.com/snowday/agenda/

We will be covering more of these updates from Snowday and the Snowflake BUILD event this week in more depth with the Snowflake Solutions Community.  Let us know if we missed anything or what you are excited about form Snowday in the comments!

Data to Value – Part 2

Data to Value Trends.  PART 2.  TRENDS #2-4.  (NEXT WEEK WE WILL RELEASE THE FINAL 3 trends we are highlighting)

Welcome to our Snowflake Solutions Community readers who have read Part 1 of this Data to Value 3 part series.  For those of you who have not read part 1 and want to fast forward…. We are making a fundamental point that we data professionals and data users of all types need to be focused NOT just on the creation, collection, and transformation of data.  We need to make a cognizant effort to focus and measure WHAT is the TRUE VALUE that each set of data creates?  Also, we need to measure, how fast we can get to that value if it provides any real business advantages.  There is an argument to also alter the value of the data that is time dependent since it loses value sometimes the older it is.

Here are the trends we are seeing related to the improvement of Data to Value.  Some of my favorites that are revolutionizing how data moves rapidly with more QUALITY to value for HUMANS and their ORGANIZATIONS:

Trend #1 – covered last week –>  Data to Value – Non-stop push for faster speed.  

Trend #2 – Data Sharing.  More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy” in-place data in near real time.  Data Sharing is a massive competitive advantage if set up and used appropriately.  You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake.  This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

Trend #3 – Creating Data with the End in Mind.  When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are  realizing there are advantages to capturing data in formats which are ready for immediate processing.  If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data to value cycles creating competitive advantages with certain data streams and sets.

Trend #4 – Automated Data Applications.  I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated together.  Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low cost price point due to massive volumes of customers. 

 Details for these next 3 are coming next week 🙂

Trend #5 – Full Automated Data Copying Tools.  I have watched the growth of Fivetran and Stitch since 20-18.  It has been amazing.  Now I see the growth of Hightouch and Census as well now which is also incredibly amazing.  

Trend #6 – Coming next week

Trend #7 – Coming next week

*What data to value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Snowflake’s Announcements related to Data to Value

IF YOU READ MY ARTICLE LAST WEEK.  These are currently exactly the same as last week.  I”m waiting for some of my readers to see if you have any other Snowflake Summit Announcements that I missed that are real Data to Value features as well!

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web based OLTP type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right its a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, they also make the development of data apps and the combination of OLTP/OLAP applications much less costly and more achievable for “all” types of companies.  They could remove massive friction that exists with having to have massive high end full stack development.  Streamlit really is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications though data applications?).  Its really another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Issaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is really really new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  If you read my initial Data to Value Article than these Snowflake items around Data to Value are the same as the first article.  Do you have any others that were announced at Snowflake Summit 2022?  I hope you found this 2nd article around Data to Value useful for thinking about your data initiatives.   Again, focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

Continue onto Data to Value Part 3

or go back to Part 1 – Data to Value

Data to Value

Data to Value – Part 1.  I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data.  The “data concept” space has been exploding with an increase of many different concepts and ideas.  There are so many new data “this” and data “that” tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data.  Data to Value.

The main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, or individual value.  All the other technical details and jargon between the creation and collection of the data to the value realization is important but for many users it has become overly complex especially many of the “latest concepts”.

For a short moment, let’s let go of all the consulting and technical data terms that are often becoming overused and often mis-used like Data Warehouse, Data Lake, Data Mesh, Data Observability, Data THIS and Data THAT.  Currently I’m even seeing that data experts and practitioners will have different views around the latest concepts depending on where their data education began and with the types of technologies they used.

Data to Value is what really matters

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  The intersection of cloud, data, automation, and ai/ml is having massive impacts on our society.

Data to Value Trends

Back in 2018, I had the opportunity to consult with some very advanced and mature data engineering solutions.  A few of them were actively moving with Kafka/Confluent towards true “event-driven data processing”.  It was a massive shift from the traditional batch processing used throughout 98% of implementations I had worked on previously.  The concept of using non-stop streams of data from different parts of the organizations delivered through Kafka topics I thought to be pretty awesome.  At the same time it was some pretty advanced concepts and paradigm shifts at that time for all but very advanced data engineering teams.  Here are the Data to Value Trends that I think you need to be aware of:

 

Trend #1 – Non-stop push for faster speed of Data to Value.  Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations especially around improved value chains and concepts such as supply chains.  Businesses and organizations continuously look for any advantage they can get.  I kinda of hate linking to McKinsey for backup but here goes.  Their characteristic #2 for the data-driven enterprise of 2025.  “Data is processed and delivered in real time”

 

Trend #2 – Data Sharing.  Coming next week – Part 2.

Trend #3 – Coming next week – Part 2.

Trend #4 – Coming next week – Part 2.

Trend #5 – Full Automated Data Copying Tools.  The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth at automated data copy pipelines going the other way like Hightouch.  At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.  Coming in 2 weeks – Part 3

Trend #6 – Coming in 2 weeks – Part 3

Trend #7 – Coming in 2 weeks – Part 3

*What data to value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Snowflake’s Announcements related to Data to Value

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web based OLTP type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right its a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, they also make the development of data apps and the combination of OLTP/OLAP applications much less costly and more achievable for “all” types of companies.  They could remove massive friction that exists with having to have massive high end full stack development.  Streamlit really is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications though data applications?).  Its really another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Issaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is really really new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  I truly believe these forces have been changing our world everywhere and will continue to do so for many years.  Data to Value for me is a really key concept that helps me prioritize what provides value from our data related investments and work.

Continue to part 2 and 3 of this series:

Data to Value – Part 2

Data to Value – Part 3

I hope you found this useful for thinking about your data initiatives.   Focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

Snowflake Optimization Resources

Shortest Snowflake Summit 2022 Recap from a Snowflake Data Superhero

If you missed the Snowflake SUMMIT or any part of Snowflake Summit Opening Keynote. Here are the most key feature announcements and recap[in “brief” but “useful” detail]

KEY FEATURE ANNOUNCEMENTS — EXECUTIVE SUMMARY. [mostly in a chronological order of when they were announced. My top ~20. The number of announcements this week was overwhelming!]

Cost Governance:

#1. New Resource Groups concept announced where you can combine all sorts of snowflake data objects to monitor their resource usage. [this is huge since Resource Monitors were pretty primitive]

#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets coming into Private Preview in the next few weeks]

#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Replication Improvements on SnowGrid:

#4. Account Level Object Replication (Previously, Snowflake allowed data replication but not other account type objects. Now, all objects which are not just data can supposedly now can be replicated as well. Users)

#5. Pipeline Replication and Pipeline Failover. Stages and Pipes now can be replicated as well. [Kleinerman stated this is coming soon to Preview. I’m assuming Private Preview?] — DR people will love this!

Data Management and Governance Improvements:

#6. The combination of tags and policies. You can now do  — [Private Preview now and will go into public preview very soon]

Expanding External Table Support and Native ICEBERG Tables:

#7. External Table Support for Apache Iceberg is coming shortly. Remember though that External tables are ONLY read only and have other limitations so see what Snowflake did in #9 below. [pretty amazing]

#8. EXPANDING Snowflake to handle on-premise data with Storage Vendor Partners so far of Dell Technologies and Pure Storage [their integration will be in private preview in the next few weeks.]

#9. Supporting ICEBERG TABLES with FULL STANDARD TABLE support in Snowflake so these tables will support replication, time-travel, etc. etc. [very huge]. This enables so much more ease of use within a Data Lake conceptual deployment. EXPERT IN THIS AREA: Polita Paulus

Improved Streaming Data Pipeline Support:

#10. New Streaming Data Pipelines. Main innovation is the capability to create a concept of MATERIALIZED TABLES. Now you can ingest streaming data as row sets. [very huge]. EXPERT IN THIS AREA: Tyler Akidau

  • Funny — I did a presentation in Snowflake Summit 2019 on Snowflake’s Kafka connector. Now that is like ancient history. 

Application Development Disruption with Streamlit and Native Apps:

#11. Low code data application development via Streamlit. The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. Its still very early but this is super interesting.

#12. Native Application Framework. I have been working with this for about 3 months and I think its a game-changer. It allows all of us data people to create Data Apps and share them on a marketplace and monetize them as well. It really starts to position Snowflake and its new name (UGH! 3rd name change — 2019=Data Exchange, 2020=Data Marketplace, 2022=

Expanded SnowPark and Python Support:

#13. Python Support in the Snowflake Data Cloud. More importantly, this is a MAJOR MOVE to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for ALL workloads including Machine Learning. This has been an ongoing move by Snowflake to make it much much easier to run data scientist type workloads within Snowflake itself.

#14. Snowflake Python Worksheets. This is really combined with the above announcement and enables data scientists who are used to Jupyter notebooks to more easily work in a fully integrated environment in Snowflake.

New Workloads. Cybersecurity and OLTP! boom!

#15. CYBERSECURITY. This was announced awhile back but I wanted to include it here to be complete since it was emphasized again.

#16. UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much much larger part of data and application workloads by extending its capabilities BEYOND OLAP [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a massive move and positioning Snowflake as a single integrated data cloud for all data and all workloads.

Additional Improvements:

#17. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#18. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#19. Large Memory Instances. [not much more to say. they did this to handle more data science workloads but it shows Snowflake’s continued focus around customers when they need something else.]

#20. ̶D̶a̶t̶a̶ Marketplace Improvements. The Marketplace, one of my favorite things about Snowflake. They mostly announced incremental changes

Final Note: I hope you find this article useful and please let me know in the comments if you feel I missed anything really important.

I attempted to make it as short as possible while still providing enough detail so that you could understand that Snowflake Summit 2022 contained many significant announcements and moves forward by the company.

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

  1. Snowflake is positioning itself now way way beyond a cloud database or data warehouse. It now is defining itself as a full stack business solution environment capable of creating business applications
  2. Snowflake is emphasizing it is not just data but that it can handle “ALL WORKLOADS” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams data pipelines from kafka, etc. and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.
Snowflake at a very high level wants to:
  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development
Want more recap beyond JUST THE FEATURES? Here is a more in-depth take on the Keynote 7 Pillars that were mentioned: Frank Slootman Recap:  MINUTE: ~2 to ~15 in the video Snowflake related Growth Stats Summary: *Employee Growth:  2019:  938 Employees 2022 at Summit:  3992 Employees *Customer Growth: 2019:  948. Customers 2022 at Summit:  5944 Customers *Total Revenue Growth: 2019:  96M 2022 at Summit:  1.2B   Large emphasis on MISSION PLAN and INDUSTRY/VERTICAL Alignment.   MINUTE: ~15 to ~53 – Frank Slootman and Benoit 53 to 57:45 – Christian Intros. Frank introduces the pillars of Snowflake INNOVATION  and then Benoit and Christian delve into these 7 Pillars in more depth. Let’s go through the 7 PILLARS OF SNOWFLAKE INNOVATIONS! ALL DATA – Snowflake is emphasizing they can handle not only Structured Data and Semi-Structured but also Unstructured Data of ANY SCALE.  Benoit even said companies can scale out to 100s of Petabytes.
  1. ALL WORKLOADS – There is a massive push by Snowflake to provide an integrated “all workload” platform. They define this as all types of data, all types of workloads now (emphasizing now it can handle all ML/AI type workloads via SnowPark and most ). [My take:  one of Snowflake’s original architecture separation of compute and storage still is what makes it so so powerful.]
  2. GLOBAL – An emphasis on that Snowflake based on SnowGrid is a fully Global Data Cloud Platform. As of today, Snowflake is deployed over 30 cloud regions on the three main cloud providers. Snowflake works to deliver a unified global experience with full replication and failover to multiple regions based on its unique architecture of SnowGrid.
  3. SELF-MANAGED – Snowflake still is focusing a TON on continuing to make Snowflake SIMPLE and easy to use.
  4. PROGRAMMABLE – Snowflake now can be programmed not only with SQL, Javascript, Java, Scala but also Python and preferred libraries. This is where STREAMLIT fits in.
  5. MARKETPLACE – Snowflake emphasizes it continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares.).  Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. GOVERNED – Frank’s story from 2019 keynote…someone grabbed him and said…You didn’t talk about GOVERNANCE [so Frank and everyone talked a ton about it this time!] – Snowflake and Frank state that there is a continuous heavy focus on Data Security and Governance.
OTHER KEY PARTS OF THE KEYNOTE VIDEO: [ fyi – if you didn’t access it already the FULL Snowflake Summit 2022 Opening Keynote is here: https://www.snowflake.com/summit/agenda?agendaPath=session/849836 ] MINUTE: ~57:45 to 67 (1:07) – Linda Appsley – GEICO testimonial on Snowflake. MINUTE: Goldman Executive presentation.  

Cost Governance on Snowflake

Cost Governance on Snowflake.  In this article I will break down what you can do for Cost Governance on Snowflake as of July 2022.  Snowflake is making decent strides in this area, even though I still think you need to use a full Snowflake Cost Optimization service like Snoptimizer™ or Nadalytics.  The reality is that Snowflake still derives the bulk of its revenue from consumption based services (their compute warehouses….) and while they have amazing NPS scores there are still many many pitfalls around costs on Snowflake.  Quick history of what was available before Summit 2022 announcements.

Before Snowflake Summit 2022, Cost Governance in Snowflake was honestly pretty weak.  It only had the following GUI and optimization tools:

  1. Snowflake’s Standard Classic Console.  Daily Summary – Very limited and ONLY available to very limited ROLES!
  2. Snowsight – Usage Views – More granularity of costs but there are problems with some default views and bugs.  Again, by default it is locked down to certain roles.  Personally I do not understand why costs at an organization should only be viewable by ACCOUNTADMIN Role by default.
  3. Third Party Optimization Tools
    1. Nadlytics
    2. Snoptimizer™
  4. Third Party “Reactive” Reporting Tools (from all the Snowflake Health Check Consulting Engagements I’ve done, this was the most common set of tools for Cost Governance on Snowflake).
    1. Sigma Computing Cost and Usage
    2. Looker Snowflake Cost and Usage
    3. Tableau Snowflake Cost and Usage
    4. Many other smaller fragmented brands with “reactive” reporting around costs.  The problem with reactive reporting is … if something REALLY goes bad like a long running query where there is NO Resource monitor OR the resource monitor is ONLY set to kick in when the query ends which by default could actually be 48 hours….  If this happens $1000s or $10,000+++ of dollars can be spent within a day easily with no true Data to Value provided!

***Also, I have documented for years how compute default settings all over your normal Snowflake Account are at odds with Cost Optimization such as these [most of which have been there since 2018 but some were introduced with our friend Snowsight…]

* Snowflake Warehouse Default Settings (Why is the default set to an XLARGE?)
* Similarly, why is Auto Suspend set by default to 10 minutes?
* Default Usage View on Snowsight – Ugh..

*And those are only some of the major ones.  There are all sorts of cost pitfalls…  My fellow Data Superhero Slim outlines them here:

After Snowflake Summit 2022, these major Cost Governance announcements were provided..

For full recap – Snowflake Summit Recap] – This has been a great advance especially in Cost Governance on Snowflake Capabilities as well as flexibility.  [we just have to wait until it all gets into GA !!]

#1. New Resource Groups concept announced where you can combine all sorts of snowflake data objects to monitor their resource usage. [this is huge since Resource Monitors were pretty primitive]

#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets coming into Private Preview in the next few weeks]

*FRANK:  this is huge and in hindsight you kind of ask yourself… why wasn’t this in the core product to begin with?

#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

*This is pretty cool that “finally” you can actually have Organization Level Reporting.  (Even though as of this date, I’m not aware of any of the Sigma, Looker, or Tableau interfaces adding this.). Again though while this is better than nothing it has real latency reactive vs. proactive.

Do you want to optimize your Snowflake Account, Try Snoptimizer™ Assessment for Free.

 

 

What is a Snowflake Data Superhero?

What is a Snowflake Data Superhero?  Currently a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc. etc.  Finally, Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community.  Snowflake Data Superheroes get some decent DSH benefits as well (see below)

The Snowflake Data Superhero Program (Before Fall 2021)

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years.  I don’t even think there was exact criteria to be in it.  Since I was a long time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business) I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021.  Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am – 9am – Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program.  Especially for many of us in the Snowflake Advocate Old Guard.  (There are probably around 40 of us I’d say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires.  Benoit did too but as I’ve stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture.  As an engineer you have to give them some respect.)

The Snowflake Data Superhero Program (2022)

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts!  They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in-person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

They mention that they look for the following key attributes:

  • You must overall be a Snowflake expert
  • They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.  I think they even took into account for me the Snowflake Essentials book I wrote.)
  • They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
  • They also want people who support other community members and provide feedback on the Snowflake product.
  • They want overall energetic and positive people

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above.  This past year, since the program was new I also think it came down too that only certain people applied.  (I think next year it will be less exclusesive since the number of Snowflake experts is really growing from my view.  Back in 2018, there honestly was a handful of us.  I would say less than 100 worldwide.  Now there are most likely 200++ true Snowflake Data Cloud Experts outside of Snowflake Employees.  Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert.  The only way that i’m doing it (or trying too) is to employee many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions.). I also would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography.  For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the Snowflake Data Superhero 48…. there are a few that just stand out as producing an insane amount of free community content.  I’m going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

What benefits do you get when you become a Snowflake Data Superhero?

Snowflake Data Superhero BENEFITS:

In 2022, they also provided all of these benefits:

  • A ticket to the Snowflake Summit – I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake corp decisions that are not customer or partner focused, this was Snowflake Corporation actually doing something awesome and really right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake’s success (no stock, no salary).  While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
  • Snowflake Swag that is different (well, it was for awhile, now others are buying the “kicks” or sneakers)
  • Early education on new Snowflake Features
  • Early access to new Snowflake Features (Private Preview)
  • Some limited opportunities to speak at events.  (Let’s face it, the bulk of speaking opportunities these days goes in this order:  Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
  • VIP access to events (we had our own Data Superhero area within Snowflake Summit)
  • Actual Product Feedback sessions with the Snowflake Product Managers

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits.  Also, I do not think many of the DSHs have received their Data Superhero card.  (this was one of those benefits provided to like maybe 10 or more of the DSHs back in 2019 or so.  Basically anyone who was chosen to speak at Snowflake BUILD I believe is where some of it started.  I’m not 100% sure.)

The Snowflake Data Superhero Program (2023)

How do I apply to be a Snowflake Data Superhero?
Here you go:  [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

Snowflake’s Data Superhero Program Evolution

I will add some more content around this as I review how the 2023 program is going to work.  I will say I have been suprisingly pleased with the DSH Program overall this year in 2022.  It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

Snowflake’s Data Superhero Program Internal Team

I also want to give a shout out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program.  These individuals and more have been wonderful to work with this year:

  • Howard Lio
  • Leith Darawsheh
  • Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

Other Snowflake Data Superhero Questions:

Here was the full list from Feb 2021.

Who are the Snowflake Data Superheroes?

https://medium.com/snowflake/introducing-the-2022-data-superheroes-ec78319fd000

Summary

I kept getting all of these questions about, hey – what is a Snowflake Data Hero?  What is a Snowflake Data Superhero?  How do I become a Snowflake Data Superhero?  What is the criteria for becoming one?

So this article is my attempt to answer all of your Snowflake Data Superhero related questions in one place.  (from an actual Snowflake Data Superhero – 3+ years in a row).  Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Shortest Snowflake Summit 2022 Recap from a Snowflake Data Superhero

If you missed the Snowflake SUMMIT or any part of Snowflake Summit Opening Keynote. Here are the most key feature announcements and recap[in “brief” but “useful” detail]

KEY FEATURE ANNOUNCEMENTS — EXECUTIVE SUMMARY. [mostly in a chronological order of when they were announced. My top ~20. The number of announcements this week was overwhelming!]

Cost Governance:

#1. New Resource Groups concept announced where you can combine all sorts of snowflake data objects to monitor their resource usage. [this is huge since Resource Monitors were pretty primitive]

#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets coming into Private Preview in the next few weeks]

#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Replication Improvements on SnowGrid:

#4. Account Level Object Replication (Previously, Snowflake allowed data replication but not other account type objects. Now, all objects which are not just data can supposedly now can be replicated as well. Users)

#5. Pipeline Replication and Pipeline Failover. Stages and Pipes now can be replicated as well. [Kleinerman stated this is coming soon to Preview. I’m assuming Private Preview?] — DR people will love this!

Data Management and Governance Improvements:

#6. The combination of tags and policies. You can now do  — [Private Preview now and will go into public preview very soon]

Expanding External Table Support and Native ICEBERG Tables:

#7. External Table Support for Apache Iceberg is coming shortly. Remember though that External tables are ONLY read only and have other limitations so see what Snowflake did in #9 below. [pretty amazing]

#8. EXPANDING Snowflake to handle on-premise data with Storage Vendor Partners so far of Dell Technologies and Pure Storage [their integration will be in private preview in the next few weeks.]

#9. Supporting ICEBERG TABLES with FULL STANDARD TABLE support in Snowflake so these tables will support replication, time-travel, etc. etc. [very huge]. This enables so much more ease of use within a Data Lake conceptual deployment. EXPERT IN THIS AREA: Polita Paulus

Improved Streaming Data Pipeline Support:

#10. New Streaming Data Pipelines. Main innovation is the capability to create a concept of MATERIALIZED TABLES. Now you can ingest streaming data as row sets. [very huge]. EXPERT IN THIS AREA: Tyler Akidau

  • Funny — I did a presentation in Snowflake Summit 2019 on Snowflake’s Kafka connector. Now that is like ancient history. 

Application Development Disruption with Streamlit and Native Apps:

#11. Low code data application development via Streamlit. The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. Its still very early but this is super interesting.

#12. Native Application Framework. I have been working with this for about 3 months and I think its a game-changer. It allows all of us data people to create Data Apps and share them on a marketplace and monetize them as well. It really starts to position Snowflake and its new name (UGH! 3rd name change — 2019=Data Exchange, 2020=Data Marketplace, 2022=

Expanded SnowPark and Python Support:

#13. Python Support in the Snowflake Data Cloud. More importantly, this is a MAJOR MOVE to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for ALL workloads including Machine Learning. This has been an ongoing move by Snowflake to make it much much easier to run data scientist type workloads within Snowflake itself.

#14. Snowflake Python Worksheets. This is really combined with the above announcement and enables data scientists who are used to Jupyter notebooks to more easily work in a fully integrated environment in Snowflake.

New Workloads. Cybersecurity and OLTP! boom!

#15. CYBERSECURITY. This was announced awhile back but I wanted to include it here to be complete since it was emphasized again.

#16. UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much much larger part of data and application workloads by extending its capabilities BEYOND OLAP [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a massive move and positioning Snowflake as a single integrated data cloud for all data and all workloads.

Additional Improvements:

#17. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#18. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#19. Large Memory Instances. [not much more to say. they did this to handle more data science workloads but it shows Snowflake’s continued focus around customers when they need something else.]

#20. ̶D̶a̶t̶a̶ Marketplace Improvements. The Marketplace, one of my favorite things about Snowflake. They mostly announced incremental changes

Final Note: I hope you find this article useful and please let me know in the comments if you feel I missed anything really important.

I attempted to make it as short as possible while still providing enough detail so that you could understand that Snowflake Summit 2022 contained many significant announcements and moves forward by the company.

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

  1. Snowflake is positioning itself now way way beyond a cloud database or data warehouse. It now is defining itself as a full stack business solution environment capable of creating business applications
  2. Snowflake is emphasizing it is not just data but that it can handle “ALL WORKLOADS” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams data pipelines from kafka, etc. and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

Snowflake at a very high level wants to:

  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development

Want more recap beyond JUST THE FEATURES?

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Frank Slootman Recap: 

MINUTE: ~2 to ~15 in the video

Snowflake related Growth Stats Summary:

*Employee Growth: 

2019:  938 Employees

2022 at Summit:  3992 Employees

*Customer Growth:

2019:  948. Customers

2022 at Summit:  5944 Customers

*Total Revenue Growth:

2019:  96M

2022 at Summit:  1.2B

 

Large emphasis on MISSION PLAN and INDUSTRY/VERTICAL Alignment.

 

MINUTE: ~15 to ~53 – Frank Slootman and Benoit

53 to 57:45 – Christian Intros.

Frank introduces the pillars of Snowflake INNOVATION  and then Benoit and Christian delve into these 7 Pillars in more depth.

Let’s go through the 7 PILLARS OF SNOWFLAKE INNOVATIONS!

ALL DATA – Snowflake is emphasizing they can handle not only Structured Data and Semi-Structured but also Unstructured Data of ANY SCALE.  Benoit even said companies can scale out to 100s of Petabytes.

  1. ALL WORKLOADS – There is a massive push by Snowflake to provide an integrated “all workload” platform. They define this as all types of data, all types of workloads now (emphasizing now it can handle all ML/AI type workloads via SnowPark and most ). [My take:  one of Snowflake’s original architecture separation of compute and storage still is what makes it so so powerful.]
  2. GLOBAL – An emphasis on that Snowflake based on SnowGrid is a fully Global Data Cloud Platform. As of today, Snowflake is deployed over 30 cloud regions on the three main cloud providers. Snowflake works to deliver a unified global experience with full replication and failover to multiple regions based on its unique architecture of SnowGrid.
  3. SELF-MANAGED – Snowflake still is focusing a TON on continuing to make Snowflake SIMPLE and easy to use.
  4. PROGRAMMABLE – Snowflake now can be programmed not only with SQL, Javascript, Java, Scala but also Python and preferred libraries. This is where STREAMLIT fits in.
  5. MARKETPLACE – Snowflake emphasizes it continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares.).  Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. GOVERNED – Frank’s story from 2019 keynote…someone grabbed him and said…You didn’t talk about GOVERNANCE [so Frank and everyone talked a ton about it this time!] – Snowflake and Frank state that there is a continuous heavy focus on Data Security and Governance.

OTHER KEY PARTS OF THE KEYNOTE VIDEO:

[ fyi – if you didn’t access it already the FULL Snowflake Summit 2022 Opening Keynote is here:

https://www.snowflake.com/summit/agenda?agendaPath=session/849836 ]

MINUTE: ~57:45 to 67 (1:07) – Linda Appsley – GEICO testimonial on Snowflake.

MINUTE: Goldman Executive presentation.

 

What is Snoptimizer?

Hmmm.  What is Snoptimizer™?

Snoptimizer™ is the first automated Cost, Performance, and Security Optimizer for Snowflake Accounts.  It is by far the easiest and fastest way to optimize your Snowflake Account quickly.  This unique service can optimize your Snowflake Account within minutes.  It is a no-brainer to test out.  Signup here for a fast and easy 10 day free trial.

From a high level, Snoptimizer™ can be quickly setup by your ACCOUNT ADMINISTRATOR role on Snowflake within minutes.  What have you got to lose?

Okay.  Snoptimizer sounds cool.  What does Snoptimizer really do?  
Also, why did you build Snoptimizer?  The reason we built Snoptimizer™ because we saw a tremendous need in the marketplace.   All too often we were brought in by existing Snowflake customers for Snowflake Health checks and 98% of the time the customers’ Account was not optimized as much as we could optimize it.  Unfortunately, most of the time it was actually “highly” unoptimized and the customer was not using Snowflake as effectively as it could in one or many areas.  To solve this need we built Snoptimizer™.

Snoptimizer was built by longtime Snowflake Data Heroes [Only 50 in the world], consultants, and Snowflake Powered-By product builders.  Our Snoptimizer team is composed of some of the deepest experts in Snowflake optimization and migration.  We have lived, breathed, and eaten Snowflake for several years and studied every area of Snowflake in depth in order to provide you an unparalleled optimization service.

Problem Statement that Snoptimizer Solves:  Snowflake is an amazingly scalable and easy to use consumption based Data As a Service.  Simple put, its an amazing Cloud RDBMS that scales like nothing I had seen before.  That being said, The Snowflake Data Cloud offering is changing and growing with new functionality and cost related services constantly.  Also, the Snowflake database and overall Snowflake Data Cloud conceptually is relatively new for many new Snowflake administrators and users.  While the RDBMS basics are relatively easy to use compared to past solutions and other analytical data solutions, it is definitely NOT easy to fully optimize a Snowflake Account to get the most efficiency and cost savings possible.  This requires in-depth understanding of hundreds of different views and objects.  You need to deeply understand how warehouses, resource monitors, query_history, materialized views, search optimization, snowpipe, load_history, etc. etc.  operate and their core level.  So quickly and easily Snoptimize your Snowflake Account now!

A few Customer Optimization Problems we have seen. 

*Poorly configured consumption.  Too often we see unneeded consumption credits wasted by incorrectly setup warehouses.  Remember, cost-consumption based services are AWESOME until they are incorrectly used.  We covered long ago how Snowflake costs and cost risks you are exposed to can add up fast in this Cost Risk article.   can   well optimized than we would see performance issues or security issues with the account.  Therefore we scoured every areas of Snowflake meta data views and created the most advanced optimizations around cost, performance, and security beyond anything documented or available anywhere else.

*Incorrect Storage based settings and/or architectures.   Again, we come into Snowflake Health Checks and find some interesting settings with 10, 20, 30, 90 day time travel settings across objects that do not require this level of time-travel.  Or we see lift and shift migrations that retain drop and recreate architectures which make no sense in a Snowflake Account.

*Warehouse with inefficient settings.   This is one of the most common issues we fix immediately that often saves our customers hundreds of dollars daily.

*Accounts with HUGE Cost Risk Exposure.   As we stated in previous blog posts here at ITS Snowflake Solutions, Snowflake brings awesome scale but if not correctly “Snoptimized” it also has extensive cost risks by default with its consumption-based pricing especially around compute and larger warehouses.  This is the Snowflake Cost Risks we wrote about previously.  Extreme Risks with a 6XL on AWS.

Snowflake Cost Risk Use Case 6XL – 1 cluster:
*Cost per hour @ $3/credit = 3 * 512 = $1536
*okay – so right now – 5XL and 6XL are in private preview ONLY on AWS so then… 
What’s your cost exposure with a 4XL on ONLY 1 cluster?
Snowflake Cost Risk Use Case 4XL – 1 cluster:
*Cost EXPOSURE per hour @ $3/credit = 3 * 256 = $768 PER HOUR
*Within 7 hours you go through $5,376k
*Within 14 hours your ACCOUNT spend is:  $10,752

Again, with great power and with Ease of Use also comes great responsibility.  While we have loved Snowflake and its ease of use, we also have done so many Snowflake health checks where Snoptimizer™ picks up and fixes Account problems where customers re-sized warehouses EASILY [yes, you can do it via the interface or via command line in seconds – it is BOTH awesome and dangerous to untrained users WHO HAVE create/alter warehouse GRANTED ROLES] but unnecessarily.  Unfortunately, even for Snowflake data professionals and administrative experts it is ALL too easy to change Warehouse settings for a simple test or even for their QA team and miss re-setting them or auto suspending them.  A good analogy we use is UNOPTIMIZED Snowflake ACCOUNTS often allow users to drive over non-existent or POORLY setup Snowflake Cost Guardrails.

What does Snoptimizer™ do?

Snoptimizer runs regularly and scours your Snowflake Operations Account Meta Data (Over 40 Views) continuously looking for Snowflake storage and compute anti-patterns and inefficiencies related to cost, performance, and security.  It is the only continuous service watching out for you and your Snowflake Account to keep it in tip top shape!

Let’s dive deeper:

Snoptimizer™ – Snowflake Cost Optimization:

The Snoptimizer Cost Optimization service regularly evaluates ongoing queries and settings related to Warehouses and Resource Monitors.  It immediately removes incorrectly setup Accounts specifications.   service.

The Snoptimizer service repeatedly works for you and regularly looks at your Snowflake Account(s) and finds any areas that can be optimized for costs.  The service can be set for automatic optimization settings or settings that must be approved.  Snoptimizer is your best friend for optimizing costs on your Snowflake Account.

Again, Snowflake RDBMS and DDL, DML basics are easy to use BUT misconfiguration of warehouses and compute optimization services are ALSO EASY to do.  Snoptimizer removes this unnecessary inefficient and unoptimized use of Snowflake compute and storage.

Snoptimizer™ – Snowflake Performance Optimization:

The Snoptimizer Team not only scours cost optimizations but at the same time we look at all of your query_history and related views and we can pick up both warehouses that are over provisioned and under provisioned.  We are the only service we know of that automates this for you and provides suggestions on Warehouse changes or even that you should use other Snowflake cost based services that provide awesome benefits such as:

  • Auto Clustering
  • Materialized Views
  • Search Optimization

Snoptimizer™ – Snowflake Security Optimization:

Again, Snoptimizer is one of your best Snowflake automated administrative friends for security issues.  It repeatedly checks for security issues related to your Snowflake Account that put your account and data at risk.  Since security is specialized often at a company culture level we provide optimizations and best practices that you can implement to avoid account and data breaches.  Snoptimizer Security Optimization itself runs many many security checks repeatedly looking for incorrect security configurations or exposures.

If all of that wasn’t enough for you… let’s highlight some Snoptimizer Core Features:

Snoptimizer Core Features:

  • Analyzes Snowflake Compute Warehouses for inefficient settings
  • Limits Compute Resources “Risk Cost Exposure” Immediately
  • Reviews Previous Queries and Consumption for performance and efficiency
  • Provides regular reporting on Snowflake Usage
  • Creates effective Resource Monitors per warehouse
  • Provides Optimization Recommendations and Automations depending on your setup
  • Incorporates every single documented and “some undocumented” Snowflake Cost Optimization Best Practice and more
  • How does Snoptimizer Help You?

It very quickly and automatically goes in and runs automated Snowflake security, cost, and performance optimization checks and best practices immediately against our Snoptimizer customer accounts.  It removes all the headaches and worries of security and cost exposure across your entire Snowflake Account.  It removes you from mistakenly falling into the Snowflake Cost Anti-patterns.

At a high-level it just makes your Snowflake Cost, Performance, and Security Administration easier and automated for you.  No hassle optimization in a few hours. Get Snoptimized today!

Conclusion:

The Snowflake Data Cloud continues to grow and while its easy to use its much harder to optimize for cost, security, and performance.  Snoptimizer makes cost, performance, and security optimization easy for you at low cost and saves you the headaches of cost overruns or security exposures.

Snowflake Data Cloud Cost Risks

Let me first state, I love using Snowflake, the technology itself.  I fell in love with it at the beginning of 2018 when I realized how easily I could execute all of our Big Data Consulting Practice Solutions we had been doing for 18+ years so much easier than we had before. In the past with regular RDBMS solutions, Hadoop complex messes, and expensive MPP systems like Teradata, Netezza, and Exadata we often would run into scale challenges as the data grew.  Snowflake brought both an ease of use and amazing scale to almost all of my big data consulting projects.

After the last three years of working on hundreds of Snowflake accounts though my team and I realized that if Snowflake Anti-patterns occur or if poor compute security practices are used that Snowflake Accounts are exposed to large cost risks especially with compute costs. While Snowflake is an amazingly scaleable cloud database and is the best cloud data warehouse I’ve used in the last 3+ years, the deployment of a Snowflake Account without proper settings and administration exposes a company to these cost risks.  We cover the Snowflake CREATE WAREHOUSE Default Settings in that other article but let’s say you actually used the Classic Console to create a new warehouse and used all the default settings and just created it.  Even if you didn’t run any query the cost for the standard settings would be 10 minutes * XL Warehouse (16 credits/hour) @ $3/credit.  It is only $8 for that 10 minutes but it was an $8 spend for nothing.  Let’s say though you had a rogue (or curious) trainee on a Snowflake Account that didn’t understand what they were doing and does the same exact thing BUT they ONLY change the size to a 6XL.  Your 10 minute run for nothing cost exposure is 10 min * 6XL Warehouse (512 credits/hour) @ $3/credit.  Your account just spent $256 for 10 minutes of nothing.

Let’s take this to the extreme cost risk that many Snowflake Accounts are exposed to if they are not using automated tools like our Snowflake Cost Optimization and Cost Risk Minimization Service – Snoptimizer.

Snowflake Cost Risk Use Case 6XL – 1 cluster:

*Cost per hour @ $3/credit = 3 * 512 = $1536
*Cost per day @ $3/credit = $36,864

Snowflake Cost Risk Use Case 6XL – 5 clusters:

[we know this is a worst case scenario on AWS/Snowflake and this one would be rare BUT without resource monitors and correct permissions it is possible]
*Cost per hour @ $3/credit = 3 * 512 = $7680
*Cost per day @ $3/credit = $184,320

Snowflake Cost Risk Use Case 6XL – 10 clusters:

[we know this is a worst case scenario on AWS/Snowflake and this one would be rare BUT without resource monitors and correct permissions it is possible]
*Cost per hour @ $3/credit = 3 * 512 = $15,360
*Cost per day @ $3/credit = $368,640

As you can see, this is unreasonable exposure to cost risks.  If you are a Snowflake administrator, make sure you make appropriate changes to control costs and cost risk.  If you want an automated approach to Cost Risk Management that can be setup easily in a few hours than try our Snoptimizer Cost Risk Solution.

Snowflake Cost Risk Mitigation – Administration – ACCOUNTADMIN – MUST DO

The most important way to minimize Snowflake Cost Risk is to create Resource Monitors for each and every warehouse that have suspend actions that .  Here is the code to do that:  [replace 150 with your daily credit limit.

CREATE RESOURCE MONITOR "REMOVE_SNOWFLAKE_COST_RISK_EXAMPLE_RM" WITH CREDIT_QUOTA = 50 
FREQUENCY = 'DAILY', START_TIMESTAMP = 'IMMEDIATELY', END_TIMESTAMP = NULL
TRIGGERS 
ON 95 PERCENT DO SUSPEND 
ON 100 PERCENT DO SUSPEND_IMMEDIATE 
ON 80 PERCENT DO NOTIFY;

ALTER WAREHOUSE “TEST_WH” SET RESOURCE_MONITOR = “REMOVE_SNOWFLAKE_COST_RISK_EXAMPLE_RM”;

Community Conclusion:

Snowflake Data Cloud cost risk is a real issue that needs to be properly administrated.  While the Snowflake Data Cloud brings incredible scale and power to any analytical data processing need, the Data Cloud needs to be properly optimized and continuously monitored by a service like Snoptimizer.  Again, with great data processing power, comes great cost management responsibility.  If an administrator messes up somehow and gives access to a non-trained user to create that 6XL who doesn’t need it and doesn’t have the business budget for it then it can cost you BIG TIME.   Get Snoptimized now to avoid a data driven cost catastrophe.  If a new warehouse appears, we have you covered!

Snowflake Cost Guardrails – Resource Monitors

The Snowflake Data Cloud brings incredible scale and power to any analytical data processing.  It is an amazing revolution to now be able to launch their T-shirt warehouse sizes from XS (1 virtual instance) to 6XL of raw 512 EC2 instances if you are running on AWS on 1 cluster.  While this is awesome compute power, you must setup your Snowflake account with resource monitors in order to control costs.   Snowflake Resource Monitors are really your main guardrails on controlling costs with the Snowflake Data Cloud.

Let’s show you how easy it is to setup your Snowflake cost guardrails so your costs don’t go beyond what you expect.

I recommend either hiring a full or part-time Snowflake administrator focused on cost optimization and database organization or using our  Snowflake Cost Optimization Tool – Snoptimizer.  Snoptimizer automates setting up resource monitors on your Snowflake account for each warehouse and tons of other cost optimizations and controls on your Snowflake Account.   Let’s dig into the ONLY true Snowflake Cost Risk Guardrails you have had for awhile, Resource Monitors.

Resource Monitors – Your Snowflake Data Cloud Cost Guardrails

Resource Monitors are technically relatively easy to setup from the Snowflake Web GUI or command line.   Even though setting up one Resource Monitor is relatively easy, its still easy to make incorrect assumptions and not have enough effective monitoring and suspending in place.  It is like having your guardrails not installed yet if you do not do the following which is too easy to do:

aaa

Finding out that Snowflake consumption based pricing was so reasonable was game changing for me and my consulting company.  We could finally provide scale to any analytical challenge and solution we needed to create.  This was never possible before.  I remember building predictive marketing tools and we often had to crunch large data sets and we would often run into scaling challenges and have to spend tons of time and engineering effort to engineer for scale.

aaa

what is in the content locker

ALTER WAREHOUSE SET STATEMENT_TIMEOUT_IN_SECONDS = 14400;

What will be in the SIGNUP LOCKER….

Conclusion:

Snowflake Guardrails in the form of Resource Monitors correctly setup are EXTREMELY IMPORTANT.  If you don’t. know if you have these in place, then what are you waiting for?  Activate your Snoptimizer now and be optimized in a few hours as well as continuous regular cost optimization monitoring. If a new warehouse appears, we have you covered!

Snowflake Cost Optimization Best Practices

I have been working with Snowflake since the beginning of 2018 and it has been one of the most fun and scaleable data solutions I have worked with in my 27++ year data engineer, data architect, entrepreneur, and data thought leader career. With great power (almost unlimited scale – only limited by Snowflakes’ allocation of Compute within an Availability Zone) also comes great responsibility. Over the last 3 years I have analyze over 100 Snowflake accounts and about 90% of them were not fully optimized for Cloud data costs. This is why my team and I are so excited to have created Snoptimizer (the first AUTOMATED Snowflake Cost Optimization Service) – Easily optimize your Snowflake Data Cloud Account here in a few hours.

I think the reason why 90% of those accounts didn’t have resource monitors or regular optimizations in place was initially Snowflake is incredibly cost effective and typically had massive savings savings especially from on-prem migrations that we have done.

Companies though that do not optimize their Data Cloud Costs are missing out big time! This is why we designed Snoptimizer and I’m also SHARING my top 9 Snowflake Cost and Risk Optimizations below. Enjoy!

Here we go: My Snowflake Cost Optimization and Cost Risk Reduction Best Practices

Best Practice #1 – Resource Monitors. One of the first things that Snoptimizer does is automate daily Resource Monitors at a warehouse level based on all the Snowflake Metadata Database past history and Warehouses and Resource Monitor settings. This gets set almost immediately after you purchase Snoptimizer. BOOM. This has both huge cost risk reduction limits and guardrails for all of your warehouse compute.

 

You can join us to learn more about Cost Optimization at our webinar here!

See cost optimization automation code at the bottom of this post. Check out Resource Monitors best practices

 

Best Practice #2 – Auto Suspend Setting Optimization.

Another optimization automated by Snoptimizer is looking at the Warehouse workloads and automating changes to the Auto Suspend settings. BOOM. Depending on the workloads can automate additional cost savings with this automation for you.

See cost optimization automation code at the bottom of this post that helps you optimize auto_suspend. If you like videos, check out our Auto Suspend best practices

Best Practice #3 – Monitoring Cloud Services Consumption Optimization

Snoptimizer immediately looks to see if you have any large costs associated with Cloud Services on your Snowflake Account.

See cost optimization automation code at the bottom of this post. Check out our Cloud Services best practices.

Best Practice #4 – Regular Monitoring of Storage Usage Across your entire Snowflake Account.

Snoptimizer immediately reviews your Storage History over the past 60 days and looks at if you have any errant settings that have you over paying for Storage. Too often we find costly setting related to Time Travel and/or Snowflake Stages.

See cost optimization automation code at the bottom of this post that helps you optimize Storage Costs. Also check out our videos on Monitoring Storage, Compute and Services, and Usage!

Best Practice #5 – Daily Monitoring of Warehouse Compute.

Besides just adding Resource Monitors that suspend Warehouses we also provide daily monitoring of Snowflake Warehouse consumption reporting daily spikes and anomalies or changes in rolling averages. Most accounts we come across do not have regular monitoring of warehouse usage on regular proactive settings. Check out our Warehouses best practices!

 

Best Practice #6 – Regular Monitoring of New Snowflake Services.

Besides monitoring compute warehouses, Snoptimizer also immediately starts monitoring consumption on all the existing and new (private preview and after) Cloud Services that incur costs from Automatic Clustering to Search Optimization to Materialized Views and all existing and new costs. This is a huge benefit to the automation of Snoptimizer. We are ALWAYS looking out for your cost consumption optimization and cost risk reduction! We are always there for you!

Conclusion:

Snowflake Data Cloud Cost Optimization and Snowflake

Data Cloud RISK Cost Reduction is really so so important to do immediately on your Snowflake Account. What are you waiting for? Buy Snoptimizer now and be optimized in a few hours along with regular cost optimization monitoring. If a new warehouse appears, we have you covered.

Snowflake Create Warehouse Defaults

I have been working with the Snowflake Data Cloud since it was just an Analytical RDBMS.  Since the beginning of 2018, the Snowflake technology has been pretty fun to work with as a data professional and data entrepreneur.  It allows data professionals amazing flexible data processing power in the cloud.  The key to a successful Snowflake deployment is setting up security and account optimizations correctly from the beginning.  In this article we will discuss the CREATE WAREHOUSE default settings.

Snowflake Cost and Workload Optimization is the Key

After analyzing and working on hundreds of Snowflake customer accounts we have found key processes to optimize Snowflake for compute and storage costs.  The best way to have a successful Snowflake deployment is to make sure you setup the compute for cost and workload optimization.

The Snowflake default “create warehouse” settings are not optimized to limit costs. That is part of the reason we built our Snoptimizer service (Snowflake Cost Optimization Service) to help you automatically and easily optimize your Snowflake Account(s).  There is no other way to make continuous query and cost optimizations so your Snowflake Cloud Data solution can run as efficiently as possible.  Let’s take a quick look at how the Snowflake Account default settings are set right now for that brand spanking new Snowflake Account.

Here is the default screen that comes up when I click +Warehouse in the Classic Console.

Snowflake Classic Console - Create Warehouse Default Settings
Create Warehouse-Default Options for the Classic Console

Okay, for those of you already in Snowsight (aka Preview App), here is the default screen within Snowsight (or Preview App) – It is almost the same.

Snowflake Create Warehouse Defaults on Snowsight
Create Warehouse Default Options for Snowsight

So let’s dig into the default settings for these Web UIs that will be there if you just choose a name and click “Create Warehouse” – Let’s further evaluate what happens with our Snowflake Compute if you leave the default Create Warehouse settings.

Create Warehouse – Default Setting #1

Size (Really Warehouse of Compute Size): X-Large is set.  I’m going to assume you know how Snowflake Compute works and understand the Snowflake Warehouse T-Shirt Sizes. If not then read this quickly – Snowflake Compute Sizes. Notice that the default setting is X-Large Warehouse vs a smaller warehouse settings of (XS, S, M, L) T-shirt default setting. This defaults to the same setting for both the Classic Console and Snowsight (the Preview App)

Create Warehouse – Default Setting #2 [assuming an Enterprise or higher Edition]  Maximum Clusters: 2

While this default setting makes sense if you want to have clustering enabled it still has serious cost implications by default. It makes the assumption that the data cloud customer wants to launch a 2nd cluster and pay more for it on this Snowflake warehouse if it has a certain level of queued statements on the warehouse. If you stick with the XL settings – duplicating a cluster has serious cost consequences of $X/hr.

This is only the default setting for the Classic Console.  It also is ONLY set if you have Enterprise Edition or higher because the Standard Edition does not offer Clustering.

Create Warehouse – Default Setting #3 [assuming an Enterprise or higher Edition]

Minimum Clusters: 1
This is great. No issues here.

This is only the default setting for the Classic Console.

Create Warehouse – Default Setting #4 [assuming an Enterprise or higher Edition]

Scaling Policy: Standard
This setting is hard to rate but the truth is if you are a really cost conscious customer you would want to change this to “Economy” by default and not have it set as “Standard”. Why? What’s the difference? See our Q&A here on What’s the difference between Economy and Standard Scaling? High level though is that your 2nd cluster which is set by Default will kick in as soon as and Queuing happens on your Snowflake warehouse versus not launching a 2nd cluster until the Snowflake algos think that it has a minimum of 6 minutes of work that 2nd cluster would have to perform.

This is only the default setting for the Classic Console but when you do toggle the “Multi-cluster Warehouse” on Snowsight setting this does default to “Standard” vs. defaulting to “Economy”.

Create Warehouse – Default Setting #5

Auto Suspend: 10 minutes
For me, I find this typically too high of a setting by default.  Many warehouses, especially ELT/ETL warehouses do not need this high of a default setting.  For example, a load warehouse which may run at regular intervals never needs a cache and setting this high.  Our Snoptimizer service finds inefficient and potentially costly settings like this.  For a load warehouse use case Snoptimizer immediately saves you 599 seconds in compute for every interval this would run. We talk more about it in this Snowflake Warehouse Best Practice Auto Suspend article but this can add up especially if your Load Warehouses is larger in T-shirt size.

This defaults to the same setting for both the Classic Console and Snowsight (the Preview App)

Snowflake Create Warehouse – Default Setting #6

Auto Resume Checkbox: Checked by Default.
This setting is totally fine. I don’t even remember the last time I created a warehouse not set to “Auto Resume” being checked by default. This is one of the very very awesome things about Snowflake is that in milliseconds or seconds once a query is executed it brings that automated warehouse compute to your needs. This is revolutionary and awesome!

This defaults to the same setting for both the Classic Console and Snowsight (the Preview App)

Snowflake Create Warehouse – Default Setting #7

Click “Create Warehouse”: The Snowflake Warehouse is immediately started.
This setting I do not like.  I do not think it should immediately start to consume credits and go into the Running state.  It is too easy for a new SYSADMIN to start a warehouse they do not need.  The default setting before this is already set to “Resume”.  The Snowflake Warehouse will already resume when a job is sent to it so there is no need to automatically start.

This defaults to the same execution for both the Classic Console and Snowsight (the Preview App)

What do you think?

We really hope this was useful for you.  The ITS Snowflake Solutions is a community and data professionals dedicated to Snowflake education and solutions.  We are here to help you run your Snowflake Account as efficiently as possible.  We work to solve your data driven and data automation challenges! 

Below is the SQL code for those of you who just do not do “GUI” 

Forget the GUI – Let’s go to the Snowflake CREATE WAREHOUSE code to see what is happening….

DEFAULT SETTINGS:  (I think since I started in 2018)

CREATE WAREHOUSE XLARGE_BY_DEFAULT WITH WAREHOUSE_SIZE = ‘XLARGE’ WAREHOUSE_TYPE = ‘STANDARD’ AUTO_SUSPEND = 600 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = ‘STANDARD’ COMMENT = ‘This sucker will consume a lot of credits fast’;

See our Resource Monitor videos – https://www.youtube.com/watch?v=z9nvAGe1Ens

How Snowflake Pricing Works

Usage Based Pricing in the cloud in some ways is incredibly awesome.  For those of us who lived through the days of scaling Data Centers and waiting for hardware it still is pretty mind blowing.  When I first came across Snowflake at the beginning of 2018 I was amazed at how reasonable the cost was for any type of business, big or small.

The fact that we could even start an account off with 400 credits for 30 days for a Proof of Concept (POC) was just amazing to me.  Prior to this, when working with analytical databases that could scale like Exadata, Teradata, and Netezza our consulting company hesitated to introduce these more expensive solutions to our consulting clients which were small or medium size businesses because these solutions were out of their pricing comfort zone.

Snowflake Pricing Details – Basic

For those of you who are new to Snowflake, let’s start with Snowflake consumption pricing basics.  Snowflake overall is usage or consumption based pricing.  This means you ONLY pay for what you use.  Technically, you could actually setup a free Snowflake Trial Account and never pay anything because you never used any of the services that have a cost.  This is really awesome.

At the same time, as soon as your Snowflake Account is provisioned, you the administrator or person with their credit card associated with the account have extreme cost risk by default.  Our best practice is to ALWAYS enable Snowflake Cost Optimization with Snoptimizer immediately after provisioning a Snowflake Account.  If you decide against that then at the very least you should limit access or setup standard Snowflake Cost Minimization Guardrails and Snowflake Cost Optimization and Cost Minimization Best Practices.

Snowflake Compute Pricing – Basics

For most Snowflake Accounts, Snowflake Compute or the Snowflake Warehouses (which are virtual compute engines) is where 90% or more of your costs are.  The other four cost areas of Storage Costs, Cloud Computing Costs, Cloud Services Costs, and Data Transfer costs are typically easily 10% or less of the Snowflake SaaS costs per month.  Often the others can even be 1% or less unless you have certain use cases or end up mistakenly using Snowflake Cost Anti-patterns.

Snowflake Pricing Details – Advanced

For those of you who are more Snowflake savvy and already know the basics then let’s cover more advanced Snowflake pricing details.

Snowflake Compute Pricing – Advanced

*The prices stated above were standard prices taken from Snowflake’s pricing pages and pdf.  If you are a capacity customer with Snowflake I’m sure your Snowflake sales representative (also named Major Account Executive, Account Director, or Sales Director, etc.) will have worked with you to commit to some type of term of 1 to 3 years in their form of a Capacity Contract Customer.  The pricing typically varies dependent upon your spend and your commitment length pretty much like any SaaS vendor.

Over the last 3 years my teams and I have analyzed over 100 Snowflake accounts and about 95% of them were not fully optimized for both Cloud data costs and Cloud cost risk minimization.  This is why my team and I are so excited to have created Snoptimizer (the first AUTOMATED Snowflake Cost Optimization Service) – Easily optimize your Snowflake Data Cloud Account here in a few hours.

I think the reason why 90% of those accounts didn’t have resource monitors or regular optimizations in place was initially Snowflake is incredibly cost effective and typically had massive savings savings especially from on-prem migrations that we have done.  Companies though that do not optimize their Data Cloud Costs are missing out big time! This is why we designed Snoptimizer and I’m also SHARING my top 9 Snowflake Cost and Risk Optimizations below.

One of the first things that Snoptimizer does is automate daily Resource Monitors at a warehouse level based on all the Snowflake Metadata Database past history and Warehouses and Resource Monitor settings. This gets set almost immediately after you purchase Snoptimizer. BOOM. This has both huge cost risk reduction limits and guardrails for all of your warehouse compute.

See cost optimization best practices and automation code at Snowflake Solutions.

Add Additional Secret Pricing Details here.

TOP TIP:  Reduce your default query time out to 4 hours or less instead of 2 days by default.

ALTER WAREHOUSE SET STATEMENT_TIMEOUT_IN_SECONDS = 14400;

Conclusion:

I hope the Snowflake Basic and Advanced Pricing information above is useful to you on your Snowflake Journey.  For me, finding out that Snowflake consumption based pricing was so reasonable was game changing for both myself and my consulting company.  Prior to Snowflake, we couldn’t really provide compute scale with enough speed to many of the largest big analytical challenges and solutions our clients needed.  I remember building predictive marketing tools and we often had to crunch large data sets and we would often run into scaling challenges and have to spend tons of time and engineering effort to engineer for scale.

Snowflake Cost Anti-patterns

Snowflake is still my favorite Analytical Database since the beginning of 2018 but as I often present in my live training sessions and webinars, WITH GREAT POWER (practically unlimited compute scale) comes GREAT RESPONSIBILITY.  In this article I’ll cover the TOP 3 Snowflake Cost Anti-patterns my Snowflake Cost Optimization team and I have come across after 3 years of analyzing hundreds of Snowflake Accounts.  I cannot begin to state how you should either invest in a PART or FULL-TIME Snowflake DBA focused on cost and organization or if you do not have that financial luxury then use our automated Snowflake Cost Optimization Service – Snoptimizer.  Its incredibly easy and low cost to setup Snoptimizer compared to having these anti-patterns manifest (which I know happens on too many Snowflake Accounts based on our review of hundreds of them).  If you do not have cost guardrails like Resource Monitors enabled your Snowflake Compute Consumption Risk is really high and its honestly gross negligence as a data administration professional to allow this.

Let’s go through the TOP 3 Snowflake Cost Anti-patterns.

Top 3 Anti-Patterns

The first Snowflake anti-pattern is by far the worst and happens all too often.

Snowflake Cost Anti-pattern #1

Sadly, we all too often see that Resource Monitors are not setup correctly.  Some Snowflake accounts have them setup but do not have them setup at an effective grain.  One anti-pattern is that the administrator sets a large credit sized resource manager for the overall account and no other resources managers.  It is okay to have some Resource Monitors cover the account or multiple warehouses but we highly recommend to have 1 Resource Monitor set for Daily monitoring for each and every warehouse with auto suspend enabled once a credit limit is reached.  This is currently the only real solution to having guard rails on your Snowflake consumption.  Without doing this you are exposing your company and Snowflake account to significant cost risk.

Another anti-pattern related to Resource Monitors that we see too frequently is Administrators do not want to be responsible for stopping the compute so they setup Resource Monitors with ONLY notifications.  The problem with this is that, notifications are just that….ONLY something to notify you.  What if you only have 1-2 Snowflake Account Administrators and they are not monitoring the emails or web notifications frequently enough and a Large to 6XL warehouse comes online without auto suspend enabled?

Another problem as well is that Snowflake Administrators setup Resource monitors BUT do not attach them to a warehouse.  This is the same as having a guard rail but its not activated.  Ugh!

Finally, we also see Resource Monitors get setup but then Account Administrators who do not enable their email or notifications correctly.

Snowflake Cost Anti-pattern #2

Another major Snowflake Cost anti-pattern is related to storage.  We do not see this nearly as often as #1 but it can also be a cost risk danger if you do not understand the impacts of enabling longer Time-Travel settings on Snowflake.  If you have many of your tables with time-travel set to 30,60, or 90 days but you don’t really need that much time-travel and will never use it then you should change those configurations to lower time-travel settings.

There are similar potential problems with any table that is frequently updating and changing data.  These types of data tables will challenge Snowflake’s architecture because every data change requires recreating micro-partitions.  So if you have 90 day time-travel set and you are changing a table with large amounts of rows/size every few minutes or hours than its going to add up as all of those immutable micro-partions for every changes are saved for 90 days.  Also, remember by default Snowflake forces a 7 day fail-safe of storage.  So if you have Time-Travel set to 90 days then its 97 days of storage you will pay for.

Snowflake Cost Anti-pattern #3

Setting a Warehouse to “Never” auto suspend or high auto suspend settings.  If you set a Warehouse to never suspend then you are creating a never ending spend on a warehouse until you manually or through code suspend it.  If the warehouse size is only XS than this isn’t incredibly horrible but if its a larger size the costs can grow very fast and you lose all of the value of Snowflake’s consumption based pricing.

These are the top 3 most dangerous Snowflake Cost Anti-patterns we have come across.  There are many others but they are typically not as severe as these.

Conclusion for Snowflake Cost Anti-patterns:

These Snowflake Cost Anti-patterns are real and introduce your company and yourself to sizable cost risks.  This is why we recommend using Snoptimizer or enabling Snowflake Best Practice Cost Optimizations by your team or a consulting provider especially setting up Resource Monitors IMMEDIATELY or at least in the same data as your Snowflake Account is provisioned.