Shortest Snowflake Summit 2023 Recap

Posted on July 13, 2023July 14, 2023 by Frank Bell

Introduction:

Similar to last year, I wanted to create a “shortest” recap of the Snowflake Summit 2023, including the key feature announcements and innovations. This is exactly 2 weeks after Snowflake Summit has ended and I have digested the major changes. Throughout July and August we will follow up with our view of the massive Data to Value improvements and capabilities being made.

Snowflake Summit 2023 Recap from a Snowflake Data Superhero:

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

Top Announcements:

1.Native Applications goes to Public Preview:

I am slightly biased here because my teams have been working with Snowflake Native Apps since Feb/March 2022. We have been on the journey with Snowflake from early early Private Preview to now over the last 16 months or so. We are super excited about the possibilities and potential of where this will go.

2. Nvidia/Snowflake Partnership, Support of LLMs, and Snowpark Container Services (Private Preview):

Nvidia and Snowflake are teaming up (because as Frank S. says… some people are trying to kill Snowflake Corp) and they will integrate Nvidia’s LLM framework into Snowflake. I’m also really looking forward to seeing how these Snowpark Container Services work.

3. Dynamic Tables (Public Preview):

Many Snowflake customers including myself are really excited about this. These allow new Data Set related key features beyond a similar concept like Materialized Views. With Dynamic Tables you can… have declarative data pipelines, dynamic SQL Support , user defined low latency freshness, automated incremental refreshes, and snapshot isolation.

4. Managed Iceberg Tables (Private Preview):

“Managed iceberg tables” allows Snowflake Compute Resources to manage Iceberg data. This really helps with easier management of iceberg format data and helps Snowflake compete for Data Lake or really large Data File type workloads. So Snowflake customers can manage their data lake catalog with Iceberg BUT still get huge value with better compute performance with Snowflake’s query engine reading the metadata that Iceberg provides. In some ways this is a huge large file data to value play. It enables what blob storage (S3, Azure, etc.) do best BUT then being able to utilize Snowflake’s compute means less DATA TRANSFORMATION and faster value from the data including dealing with standard data modifications like updates, deletes and inserts.

5. Snowpipe Streaming API (Public Preview):

As someone that worked with and presented on the Kafka Streaming Connector back at Summit 2019 it is really great to see this advancement. Back then the connector was “ok”. It could handle certain levels of streaming workloads. 4 years later this streaming workload processing has gotten much much better.

Top Cost Governance and Control Changes:

As anyone who has read my blog over the past few years, I’m a huge advocate of the Snowflake pay for what you use is AWESOME but ONLY when tools like our Snoptimizer® Optimization tool is used or you really really setup all the cost guard rails correctly. 98% of accounts we help with Snoptimizer do not have all the optimizations set correctly. Without continuous monitoring of costs (and for that matter performance and security – which we also offer unlike a lot of the other copycats).

1. Budgets (Public Preview):

This “budget” cost control feature was actually announced back in June 2022. We have been waiting for it for some time now. It is good to see Snowflake finally delivering this functionality. Since we started as one of the top Snowflake Systems Integrators back in 2018 there has been ONLY Resource Monitors to have ANY control whatsoever with guardrail limit type functionality. This has been a huge pain point for many customers for many years. Now, with this budget feature, users can actually specify a budget and get much more granular details about their spending limits.

2. Warehouse Utilization (Private Preview):

This is another great step forward for Snowflake customers looking to optimize their Snowflake warehouse utilization. We already leverage meta data statistics that are available to do this within Snoptimizer® but we are limited by the level of detail we can gather. This will allow us to optimize workloads much better across Warehouses to get even higher Snowflake Cost Optimization for our customers.

My takeaways from Snowflake Summit 2023:

If you would like more content and my summaries are not enough details then you are in luck. Here are more details from my team on our top findings around Snowflake Summit 2023.
Snowpark Container Services allow Snowflake customers to now run any job, function or service — from 3rd party LLMs, to Hex Notebook to a C++ application to even a full database, like Pinecone, in users’ own accounts. It now supports GPUs.
Streamlit is getting a new faster and easier user interface to develop apps. It is an open-source Python-based framework compatible with major libraries like sci-kit-learn, PyTorch, and Pandas. It has Git integration for branching, merging, and version control.
Snowflake is leveraging two of its recent acquisitions — Applica and Neeva to provide a new Generative AI experience. The former acquisition has led to Document AI, an LLM that extracts contextual entities from unstructured data and queries unstructured data using natural language. The unstructured to structured data is persisted in Snowflake and vectorized. Not only can this data be queried in natural language, but it can also be used to retrain the LLM on private enterprise data. While most vendors are pursuing prompt engineering. Snowflake is following the retraining path.
Snowflake now provides full MLOps capabilities, including Model Registry, where models can be stored, version controlled, and deployed. They are also adding a feature store with compatibility with open-source Feast. It is also building LangChain integration.
Last year, Snowflake added support for Iceberg Tables. This year it brings the tables under its security, governance, and query optimizer umbrella. Iceberg table’s performance now matches the tables’ query latency in native format.
Snowflake is addressing the criticism of its high cost through several initiatives designed to make costs predictable and transparent. Snowflake Performance Index (SPI) — using ML functions, it analyzes query durations for stable workloads and automatically optimizes them. This has led to 15% improvement on customers’ usage costs.
Snowflake has invested hugely in building native data quality capabilities within its platform. Users can define quality check metrics to profile data and gather statistics on column value distributions, null values, etc. These metrics are written to time-series tables which helps build thresholds and detects anomalies from regular patterns.
Snowflake announced two new APIs to support the ML lifecycle:
ML Modeling API: The ML Modeling API includes interfaces for preprocessing data and training models. It is built on top of popular libraries like Scikit Learn and XGBoost, but seamlessly parallelizes data operations to run in a distributed manner on Snowpark. This means that data scientists can scale their modeling efforts beyond what they could fit in memory on a conventional compute instance.
MLOps API: The MLOps API is built to help streamline model deployments. The first release of the MLOps API includes a Model Registry to help track and version models as they are developed and promoted to production.
Improved Apache Iceberg integrations
GIT Integration: Native git integration to view, run, edit, and collaborate within Snowflake code that exists in git repos. Delivers seamless version control, CI/CD workflows, and better testing controls for pipelines, ML models, and applications.
Top-K Pruning Queries: Enable you to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, help reduce the need to scan across entire data sets, thereby enabling faster searches. (SELECT ..FROM ..TABLE ORDER BY ABC LIMIT 10).
Warehouse Utilization: A single metric that gives customers visibility into actual warehouse utilization and can show idle capacity. This will help you better estimate the capacity and size of warehouses.
Geospatial Features: Geometry Data Type, switch spatial system using ST_Transformation, Invalid shape detection, many new functions for Geometry and Geography
Dynamic Tables
Amazon S3-compatible Storage
Passing References for Tables, Views, Functions, and Queries to a Stored Procedure — Preview

Marketplace Capacity Drawdown Program

Anomaly Detection: Flags metric values that differ from typical expectations.

Contribution Explorer: Helps you find dimensions and values that affect the metric in surprising ways.

What did happen to Unistore?

UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features: This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

This was from last year too – it’s great to see this move forward! (even though..Streamlit speed is still a work in progress)

Application Development Disruption with Streamlit and Native Apps:

Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.

Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

Snowflake at a very high level (still) wants to:

Disrupt Data Analytics

Disrupt Data Collaboration

Disrupt Data Application Development

Data to Value – Part 2

Posted on March 16, 2023April 15, 2023 by Frank Bell

Introduction:

Welcome to our part 2 Data to Value series. If you've read Part 1 of the Data to Value Series, you've learned about some of the trends happening within the data space industry as a whole.

In Part 2 of the Data to Value series, we'll explore additional trends to consider, as well as some of Snowflake's announcements in relation to Data to Value.

As a refresher on this series, we are making a fundamental point that data professionals and data users of all types need to be focused not just on creating, collecting, and transforming data. We need to make a cognizant effort to focus on and measure what is the true value that each set of data creates. Also, we need to measure, how fast we can get to that value if it provides any real business advantages. There is an argument to also alter the value of the data that is time-dependent since it loses value sometimes the older it is.

Data to Value Trends - Part 2:

8) - Growth of Fivetran and now Hightouch.

The growth and success of Fivetran and Stitch (now Talend) has been remarkable. There is now a significant surge in the popularity of automated data copy pipelines that work in the reverse direction, with a focus on Reverse ETL (Reverse Extraction Transformation and Load), much like our trusted partner, Hightouch. Our IT Strategists consulting firm became partners with Stitch, Fivetran, and Matillion in 2018.

At the Snowflake Partner Summit of the same year, I had the pleasure of sitting next to Jake Stein, one of the founders of Stitch, on the bus from San Francisco to Sonoma. We quickly became friends, and I was impressed by his entrepreneurial spirit. Jake has since moved on to a new startup, Common Paper, a structured contracts platform, after selling Stitch to Talend. At the same event, I also had the opportunity to meet George Frazier from Fivetran, who impressed me with his post comparing all the cloud databases back in 2018. At that time, such content was scarce.

9) - Resistance to “ease of use” and “cost reductions” is futile.

Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database). As I tested out Stitch and Fivetran though, I knew that resistance was futile. The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked. There was no way to stop the data market from embracing these easier-to-use data pipeline automation tools.

What was even more compelling is you can set up automated extract and load jobs within minutes or hours most of the time. This is unlike any of the previous ETL tools we have been using for decades which were mostly software installations. These installations took capacity planning, procurement, and all sorts of organizational business friction to even get started at all. With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work. In some cases, it can be beneficial to have the expertise of data engineers and architects involved.

Overall, the concept is simple: connecting destinations and connectors to facilitate Fivetran. Destinations refer to databases or data stores. Connectors are sources of data, such as Zendesk, Salesforce, or one of the many other connectors in Fivetran. Fivetran and Hightouch are great examples of trends in data services and tools that really speed up the process of getting value from your data.

10) - Growth of Automated and Integrated Machine Learning Pipelines with Data.

Many companies, including Data Robot, Dataiku, H2O, and Sagemaker, are working to achieve this goal. However, this field appears to be in its early stages, with no single vendor having gained widespread adoption or mindshare. Currently, the market is fragmented, and it is difficult to predict which of these tools and vendors will succeed in the long run.

Snowflake's Announcements related to Data to Value

Snowflake is making significant investments and progress in the field of data analysis, with a focus on delivering value to its clients. Their recent announcements at the Snowflake Summit this year, as detailed in this source, highlight new features that are designed to enhance the Data to Value experience.

Snowflake recently announced its support of Hybrid Tables and the concept of Unistore.

This move is aimed at providing Online Transaction Processing (OLTP) to its customers. There has been great interest from customers in this concept, which allows for a single source of truth through web-based OLTP-type applications operating on Snowflake with Hybrid tables.

Announcements about Snowflake's Native Apps:

Integrating Streamlit into Snowflake.

If done correctly, this could be yet another game-changer in turning data into value.
Please note that these two items mentioned not only enable data to be processed more quickly, but also significantly reduce the cost and complexity of developing data apps and combining OLTP/OLAP applications. This removes many of the barriers that come with requiring expensive, full-stack development. Streamlit aims to simplify the development of data applications by removing the complexity of the front-end and middle-tier components. (After all, aren't most applications data-driven?) It is yet another low-code data development environment.)

Announcement of Snowpipe streamlining.

I found this particularly fascinating, as I had collaborated with Isaaic from Snowflake before the 2019 Summit using the original Kafka to Snowflake Connector. At Snowflake Summit 2019, I also gave a presentation on the topic. It was truly amazing to witness Snowflake refactor the old Kafka connector. As a result, there were significant improvements in speed and lower latency. This is yet another major victory for streamlining data to improve value, with an anticipated 10 times lower latency. The public preview is slated for later in 2022.

Announcement: Snowpark for Python and Snowpark in General

Snowflake has recently introduced a new technology called Snowpark. While the verdict is still out on this new technology, it represents a major attempt by Snowflake to provide ML pipeline data with increased speed. Snowflake is looking to integrate full data event processing and machine learning processes within Snowflake itself.

If Snowflake can execute this correctly, it will revolutionize how we approach data value. Additionally, it reduces the costs associated with deploying data applications.

Conclusion:

In part 2 of the "Data to Value" series, we explored additional trends in the data industry, including the growth of automated data copy pipelines and integrated machine learning pipelines. We also discuss Snowflake's recent announcements related to data analysis and delivering value to clients, including support for hybrid tables and native apps. The key takeaway is the importance of understanding the value of data and measuring the speed of going from data to business value.

Executives and others who prioritize strategic data initiatives should make use of Data to Value metrics. This helps us comprehend the actual value that stems from our data creation, collection, extraction, transformation, loading, and analytics. By doing so, we can make better investments in data initiatives for our organizations and ourselves. Ultimately, data can only generate genuine value if it is reliable and of confirmed quality.

Snowflake Snowday – Data to Value Superhero Summary

Posted on November 16, 2022April 13, 2023 by Frank Bell

Snowflake Snowday — Summary

Snowflake's semiannual product announcement, Snowflake Snowday, took place on November 7, 2022, the same day as the end of Snowflake's Data Cloud World Tour (DCWT).

I attended 5 DCWT events across the globe in 2022. It was fascinating to see how much Snowflake has grown since the 2019 tour. Many improvements and new features are being added to the Snowflake Data Cloud. It's hard to keep up! These announcements should further improve Snowflake's ability to turn data into value.

Let's summarize the exciting Snowflake announcements from Snowday. The features we're most enthusiastic about that improve Data to Value are:

Snowflake's Python SDK (Snowpark) is now generally available.
Private data sharing significantly accelerates collaborative data work.
The Snowflake Kafka connector, dynamic tables, and Snowpipe streaming enable real-time data integration.
Streamlit integration simplifies dashboard and app development.

All of these features substantially improve Data to Value for organizations.

Snowflake Snowday Summary - Top Announcements

TOP announcement! – whoop whoop – SNOWPARK FOR PYTHON! (General Availability – GA)

I believe this was the announcement all Python data scientists were anticipating (including myself). Snowpark for Python now enables every Snowflake customer to develop and deploy Python-based apps, pipelines, and machine-learning models directly in Snowflake. In addition to Snowpark for Python being Generally Available to all Snowflake editions, these other Python-related announcements were made:

Snowpark Python UDFs for unstructured data (Private Preview)
Python Worksheets – The improved Snowsight worksheet now supports Python so you don't need an additional development environment. This simplifies getting started with Snowpark for Python development. (Private preview)

One Product. One Platform.

Snowflake’s major push is to make its platform increasingly easy to use for most or all of its customers’ data cloud needs.
Snowflake now offers Hybrid Tables for OLTP workloads and Snowpark. Snowflake is expanding its core platform to handle AI/ML and online transaction processing (OLTP) workloads. This significantly increases Snowflake’s total addressable market.
Snowflake acquired Streamlit earlier this year for a main reason. They aim to integrate Streamlit's data application frontend and backend. They also want to handle data application use cases.
Snowflake is investing heavily to evolve from primarily a data store to a data platform for building frontend and backend data applications. This includes web/data apps needing millisecond OLTP inserts or AI/ML workloads.

Additionally, Snowflake continually improves the core Snowflake Platform in the following ways:

The Cross-Cloud Snowgrid:

Replication Improvements and Snowgrid Updates:

These improvements and enhancements to Snowflake, the cross-cloud data platform, significantly boost performance and replication. If you're unfamiliar with Snowflake, we explain what Snowgrid is here.

Cross-Cloud Business Continuity – Stream & Task Replication (PUBLIC PREVIEW) – This enables seamless pipeline failover, which is fantastic. It takes replication beyond just accounts, databases, policies, and metadata.
Cross-Cloud Business Continuity – Replication GUI (PRIVATE PREVIEW). You can now more easily manage replication and failover from a single interface for global replication. It enables easy setup, management, and failover of an account.
Cross-Cloud Collaboration – Discovery Controls (PUBLIC PREVIEW)
Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW)
Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
Cross-Cloud Governance – Tag-Based Masking (GA)
Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW)
Replication Groups – Looking forward to updates on this as well. These can enable sharing and simple database replication in all editions.
The above are available in all editions EXCEPT:
Enterprise or higher needed for Failover/Failback (including Failover Groups)
Business Critical or higher needed for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

Query Acceleration (public preview): Speeds up search queries.
Search Optimization Enhancements (public preview): Improves search relevance and precision.
Join eliminations (GA): Removes unnecessary table joins.
Top results queries (GA): Returns the most relevant search results.
Cost Optimizations: Account usage details (private preview): Reduces search costs.
History views (in development): Provides search query history.
Programmatic query metrics (public preview): Offers API for search analytics. Available on all editions EXCEPT: ENTERPRISE OR HIGHER REQUIRED for Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m thrilled about Snowflake’s announcement regarding Private Listings. Many of you know that Data Sharing, which I’ve been writing about for over 4 years, is one of my favorite Snowflake features. My latest article is “The Future of Data Collaboration.” Data Sharing is a game-changer for data professionals.

Snowflake’s announcement makes private data-sharing scenarios much easier to implement. Fulfilling different regional requirements is now simpler too (even 1-2 years ago, we had to write replication commands). I’ll provide more details on how this simplifies data sharing and collaboration. I was happy to see presenters use the Data to Value concepts in their announcement.

I appreciated Snowflake incorporating some of my Data to Value concepts, like “Time to value is significantly reduced for the consuming party.” Even better, this functionality is now available for ALL SNOWFLAKE EDITIONS.

Private Listings (Get a crisper-looking visual)

Snowflake Data Governance Improvements

All Snowflake features enable native data governance and protection.

Tag-based Masking automatically applies designated policies to sensitive columns using tags.
Search Optimization now supports tables with masking and row access policies.
FedRAMP High for AWS Government (authorization pending). *Available ONLY on ENTERPRISE+ OR HIGHER

Building on Snowflake

New announcements related to:

Streamlit integration (PRIVATE PREVIEW in January 2023) – This integration will be exciting. The private preview can’t come soon enough.
Snowpark Optimization Warehouses (PUBLIC PREVIEW) – This was a smart move by Snowflake to support AI/ML Snowpark customers’ needs. Great to see it rolled out, allowing customers access to higher memory warehouses better suited for ML/AI training scale. Snowpark code can run on both warehouse types.
*Available for all Snowflake Editions

Streaming and Dynamic Table Announcements:

Snowpipe Streaming (public preview soon): Stream data into Snowflake
Snowflake Kafka Connector (public preview soon): Stream data from Kafka into Snowflake
Snowflake Dynamic Tables (formerly Materialized Tables, private preview): Check out Dan Galvin’s article for details: https://medium.com/snowflake/️-snowflake-in-a-nutshell-the-snowpipe-streaming-api-dynamic-tables-ae33567b42e8
Available for all Snowflake Editions

Conclusion:

Overall, I'm thrilled with where this is headed. These enhancements greatly improve Snowflake's streaming data integration, especially with Kafka. Now, Snowflake customers can get real-time data streams and transform data with low latency. When fully implemented, this will enable more cost-effective and high-performance data lake solutions.

If you missed Snowday and want to watch the recording, here's the link: https://www.snowflake.com/snowday/agenda/

We'll cover more updates from Snowday and Snowflake BUILD in depth this week in the Snowflake Solutions Community.

Data to Value – Part 1 – Snowflake Solutions

Posted on September 13, 2022April 18, 2023 by Frank Bell

Introduction:

Welcome to our Frank’s Future of Data four-part series. In these articles, we will cover a few tips on how to get value out of your Snowflake data.

I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data. The “data concept” space has been exploding with an increase in many different concepts and ideas. There are so many new data "this" and data "that" tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data. Data to Value.

In layman’s terms, the main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, and/or individual value. This is the core principle that we should keep in mind when contemplating the value that data provides.

The truth is that while the technical details and jargon involved in creating and collecting data, as well as realizing its value, are important, many users find them overly complex.

For a moment, let's set aside the technical jargon that can be overused and misused, such as Data Warehouse, Data Lake, Data Mesh, and Data Observability. I've noticed that data experts and practitioners often have differing views on the latest concepts. These views can be influenced by their data education background and the types of technologies they were exposed to.

Therefore, I created these articles to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide.

On Part 1 of the Data to Value series we will cover the Data to Value trends you need to be aware of.

Data to Value Trends:

In 2018, I had the opportunity to consult with some highly advanced and mature data engineering solutions. Some of these solutions were actively adopting Kafka/Confluent to achieve true "event-driven data processing". This represented a significant departure from the traditional batch processing that had been used in 98% of the implementations I had previously encountered. I found the idea of using continuous streams of data from different parts of the organization, delivered via Kafka topics, to be quite impressive. At the same time, these concepts and paradigm shifts were quite advanced and likely only accessible to very experienced data engineering teams.

1) – Non-stop push for faster speed of Data to Value.

Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations, especially around improved value chains and concepts such as supply chains. Businesses and organizations continuously look for any advantage they can get. I kinda hate linking to McKinsey for backup but here it goes. Their number 2 characteristic for the data-driven enterprise of 2025 is “Data is processed and delivered in real-time”.

2) – Data Sharing.

More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy,” in-place data in near real-time. Data Sharing is a massive competitive advantage if set up and used appropriately. You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake. This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

3) – Creating Data with the End in Mind.

When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are realizing there are advantages to capturing data in formats that are ready for immediate processing. If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data-to-value cycles creating competitive advantages with certain data streams and sets.

4) – Automated Data Applications.

I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated. Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low-cost price point due to massive volumes of customers.

5) – Full Automated Data Copying Tools.

The growth of Fivetran and Stitch (Now Talend) has been amazing. We now are also seeing huge growth in automated data copy pipelines going the other way like Hightouch. At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.

6) – Full Automation of Data Pipelines and more integrated ML and Data Pipelines.

With the introduction of a fully automated data object and pipeline service at Coalesce, we saw for the first time that data professionals improve Data to Value through fully automated data objects and pipelines. Some of our customers are referring to parts of Coalesce as a Terraform-like product for data engineering. What I see is a massive removal of data engineering friction similar to what Fivetran and Hightouch did but at a separate area of the data processing stack. We have become an early partner with Coalesce because we think it is similar to how we viewed Snowflake at the beginning of 2018. We view Coalesce as just making Snowflake even more amazing to use.

7) – The Data Mesh Concept(s) and Data Observability.

Love these concepts or hate them, they are taking hold within the overall data professionals’ brain trust. Zhamak Dehghani (previously at Thoughtworks) and ThoughtWorks from 2019 until now have succeeded in communicating to the market the concept of a Data Mesh. Whereas, Barr Moses from Monte Carlo, has been beating the drum very hard on the concept of Data Observability. I’m highlighting these data concepts as trends that are aligned with improving Data to Value speed, quality, and accessibility. There are many more data concepts besides these two. Time will reveal which of these will gain mind and market share and which will go by the wayside.

Conclusion:

That is it for Frank’s Future of Data part 1 series article. In our second section, Part 2, we will continue exploring more trends that we should keep in mind, as well as exploring Snowflake’s announcements related to Data to Value.

Snowflake Data Clean Rooms

Posted on September 12, 2022October 25, 2023 by Frank Bell

Introduction: What is a Data Clean Room?

In this article, I will explain what a Snowflake Data Clean Room is on the Snowflake Data Cloud.
Data clean rooms on Snowflake are a set of data-related technologies that facilitate double-blind joins of data. These technologies include Data Shares, Row Access Policies, and Secure User Defined Functions. The underlying Data Sharing technology is based on Micro-Partitions, which provide features like Data Sharing and Data Cloning.

Although the original concept of data clean rooms was developed for data exchanges in advertising, I believe the concept can be applied to many other areas where "controlled" and "governed" double-blind joins of data sets can create significant value. This approach enables companies and their partners to share data at an aggregated double-blind join level, without sharing personally identifiable information (PII).
On Snowflake, sharing data through secure views and tables using their Data Share technology is already straightforward. You can share double-blind join previously agreed upon identifiers.

Part 1: Data Clean Room Example Use Cases

We helped Snowflake pioneer this new offering a couple of years ago with our client VideoAmp which we brought over to the Snowflake Data Cloud. Our original article back in July 2020 shows how to analyze PII and PHI Data using the earlier Data Clean Room concepts. Fast forward 2 years and now Snowflake has dramatically improved the initial version and scope that we put together. These are just a few examples; there are many other potential use cases for Snowflake Data Clean Rooms.

Media/Advertising:

Addressing the challenge of the "end of cookies" in a meaningful way, Snowflake's Data Clean Rooms enable Advertisers to merge their first-party data and their publisher(s)’ viewership/exposure data, delivering more value for their marketing spend.
Collaborative Promotions. Conducting customer segment overlap analysis with a co-branding/co-marketing partner can reveal areas where customer segments and audiences are aligned.
Joint loyalty offerings and/or upsells can also be developed in partnership with aligned customer "interests".

Healthcare and Life Sciences:

There are some extremely valuable use cases where we can securely share patient data and patient outcomes across government, healthcare, and life sciences to hopefully make some huge leaps forward in healthcare and life.

Financial Services:

Combining data from multiple financial institutions to identify fraud or money laundering activities without sharing sensitive customer information.

Retail:

Combining customer data from different sources to create targeted marketing campaigns and promotions.

Government:

Sharing data across different government agencies to improve public services while protecting individual privacy.

Part 2: Looking for more information about Data Clean Rooms?

Here are some additional resources to help you learn more about Data Clean rooms and Data Collaboration.

Check out Patrick's article, a good friend: https://www.linkedin.com/pulse/snowflake-data-clean-room-patrick-cuba/
Check out Rachel’s Q&A on DCRs, a good friend and one of the top data clean room experts. https://www.snowflake.com/blog/data-clean-room-qa/
Additionally, check out the article I wrote on the future of data collaboration. https://medium.com/snowflake/the-future-of-data-collaboration-e74484acf4eb

Lastly, here’s an interview I provided on my view of the opportunities around Data Clean Rooms on Snowflake. I shared some insights gained from decades of experience working in data, including thoughts about the transformational impact that cloud-based data sharing, data collaboration, data marketplaces, and data clean rooms are having on companies and industries.

What’s Next in Data Collaboration & Why Data Clean Rooms Are Exciting: Insights From Frank Bell

Are you interested in how you can use a Snowflake Data Clean Room for your business? Contact Us Today.

Cost Governance on Snowflake in 2022

Posted on July 12, 2022April 18, 2023 by Frank Bell

Introduction: What is Snowflake’s Cost Governance?

Snowflake cost governance refers to the process of managing and optimizing the costs associated with using the Snowflake cloud data platform. This involves monitoring and analyzing usage metrics to identify areas where costs can be reduced, as well as implementing strategies to control spending and prevent unexpected expenses. Snowflake offers various tools and features for cost governance, including resource groups, budgets, and usage views. However, some users may still choose to use third-party optimization tools to fully optimize their Snowflake accounts and save money.

Part 1: My Take on Snowflake’s Cost Governance

In this article, I'll explain what you can do to manage costs on Snowflake as of July 2022. Although Snowflake has made significant progress in this area, it's still recommended to use a comprehensive Snowflake cost optimization service like Snoptimizer™ or Nadalytics. This is due to the fact that Snowflake still generates most of its revenue from consumption-based services, and despite having impressive NPS scores, there are still many cost-related issues to be aware of. Before the Summit 2022 announcements, here's a brief overview of what was available.

Before Snowflake Summit 2022, Cost Governance in Snowflake was honestly pretty weak. It only had the following GUI and optimization tools:

Daily Summary is available in Snowflake's Standard Classic Console. This provided very limited information and was available ONLY to very limited ROLES!
Usage Views can be utilized in Snowsight - It shows more granularity of costs but there are problems with some default views and bugs. Again, by default, it is locked down to certain roles.
Third-Party Optimization Tools can help you view your information and make sense of it. Some are:
1. Nadlytics
2. Snoptimizer™
Third-Party "Reactive" Reporting Tools (from all the Snowflake Health Check Consulting Engagements I've done, this was the most common set of tools for Cost Governance on Snowflake).
1. Sigma Computing Cost and Usage
2. Looker Snowflake Cost and Usage
3. Tableau Snowflake Cost and Usage
4. Many other smaller fragmented brands with "reactive" reporting around costs. However, the problem with reactive reporting is that if something goes awry like a long-running query where there is NO Resource monitor OR the resource monitor is ONLY set to kick in when the query ends which by default could be 48 hours... If this happens $1000s or $10,000+ of dollars can be spent within a day easily with no true Data to Value provided!

After Snowflake Summit 2022, these major Cost Governance announcements were provided:

#1. A New Resource Groups concept announced where you can combine all sorts of Snowflake data objects to monitor their resource usage. [This is huge since Resource Monitors were pretty primitive]
#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets are available in Private Preview in the next]
#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Conclusion:

If you're interested in staying up-to-date with our latest updates, be sure to check our website regularly for more information. Looking to reduce costs and optimize your Snowflake Account to save money? Try our Snoptimizer™ Assessment for Free and see the results for yourself. We are confident that our assessment will provide you with valuable insights and recommendations to improve your Snowflake usage and help you save money in the process.

What is a Snowflake Data Superhero?

Posted on June 24, 2022April 20, 2023 by Frank Bell

What is a Snowflake Data Superhero?

Currently, a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc.

Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community. Snowflake Data Superheroes get some decent DSH benefits as well, keep reading to learn more.

I'm Frank Bell, the founder of IT Strategists and Snowflake Solutions, and I'm also a Snowflake Data Superhero. In this article, I'd like to give you an overview of what a Snowflake Data Superhero is, what the program entails, and what are some of the benefits of being chosen as a DSH.

The Snowflake Data Superhero Program (Before Fall 2021)

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years. I don't even think there were an exact criteria list to be in it. Since I was a long-time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business), I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021. Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am - 9am - Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program. Especially for many of us in the Snowflake Advocate Old Guard. (There are probably around 40 of us I'd say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires. Benoit did too but as I've stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture. As an engineer you have to give them some respect.)

The Snowflake Data Superhero Program (2022)

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts! They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

They mention that they look for the following key attributes:

You must overall be a Snowflake expert.
They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.
- I co-authored a book called Snowflake Essentials - Getting Started with Big Data in the Cloud. If you're interested in reading it, here's the Amazon link.
- Most likely, Snowflake took into account this book I co-authored when they chose me as a Data SuperHero.
They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
They also want people who support other community members and provide feedback on the Snowflake product.
They want overall energetic and positive people

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above. This past year, since the program was new I also think it came down to that only certain people applied. (I think next year it will be less exclusive since the number of Snowflake experts is really growing from my view. Back in 2018, there honestly was a handful of us. I would say less than 100 worldwide. Now there are most likely 200+ true Snowflake Data Cloud Experts outside of Snowflake Employees. Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert. The only way that I'm doing it (or trying to) is to employ many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions. I would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography. For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the 48 Snowflake Data Superheroes, there are a few that just stand out as producing an insane amount of free community content. I'm going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

What benefits do you get when you become a Snowflake Data Superhero?

Snowflake Data Superhero Benefits:

In 2022, they also provided all of these benefits:

A ticket to the Snowflake Summit - I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake Corp decisions that are not customer or partner-focused, this was Snowflake Corporation actually doing something awesome, and really the right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake's success (no stock, no salary). While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
Snowflake Swag that is different (well, it was for a while, now others are buying the "kicks" or sneakers)
Early education on new Snowflake Features
Early access to new Snowflake Features (Private Preview)
Some limited opportunities to speak at events. (Let's face it, the bulk of speaking opportunities these days goes in this order: Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
VIP access to events (we had our own Data Superhero area within Snowflake Summit)
Actual Product Feedback sessions with the Snowflake Product Managers

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits. Also, I do not think many of the DSHs have received their Data Superhero card. This was one of those benefits provided to maybe 10 or more of the DSHs back in 2019 or so. Basically, anyone who was chosen to speak at Snowflake Build I believe is where some of it started. I'm not 100% sure.

The Snowflake Data Superhero Program (2023)

How do I apply to be a Snowflake Data Superhero?
Here you go: [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

Snowflake's Data Superhero Program Evolution

I will add some more content around this as I review how the 2023 program is going to work. I will say I have been surprisingly pleased with the DSH Program overall this year in 2022. It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

Snowflake's Data Superhero Program Internal Team

I also want to give a shout-out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program. These individuals and more have been wonderful to work with this year:

Howard Lio
Leith Darawsheh
Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

Summary

I kept getting all of these questions about, hey - what is a Snowflake Data Hero? What is a Snowflake Data Superhero? How do I become a Snowflake Data Superhero? What are the criteria for becoming one?

This article is my attempt to answer all of your Snowflake Data Superhero-related questions in one place. Coming from an actual Snowflake Data Superhero, I've been one for 3+ years in a row now. Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Posted on June 17, 2022April 19, 2023 by Frank Bell

Introduction:

Today’s article provides a recap of the Snowflake Summit 2022, including the key feature announcements and innovations. We highlight the major takeaways from the event and the outline of Snowflake's position as a full-stack business solution environment capable of creating business applications.

We also include a more in-depth discussion of Snowflake's seven pillars of innovation, which include all data, all workloads, global, self-managed, programmable, marketplace, and governed.

Snowflake Summit 2022 Recap from a Snowflake Data Superhero:

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

Here are my top 20 announcements, mostly in chronological order of when they were announced. It was overwhelming to keep up with the number of announcements this week!

Cost Governance:

1. The concept of New Resource Groups has been announced. It allows you to combine all kinds of Snowflake data objects to monitor their resource usage. This is a huge improvement since Resource Monitors were previously quite primitive.

2. The concept of Budgets that you can track against. Resource Groups and Budgets coming into Private Preview in the next few weeks.

3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Replication Improvements on SnowGrid:

4. Account Level Object Replication: Snowflake previously allowed only data replication and not other account-type objects. However, now all objects that are not just data can supposedly be replicated as well.

5. Pipeline Replication and Pipeline Failover: Now, stages and pipes can be replicated. According to Kleinerman, this feature will be available soon in Preview.

Data Management and Governance Improvements:

6. The combination of tags and policies. You can now do — Private Preview now and will go into public preview very soon.

Expanding External Table Support and Native Iceberg Tables:

7. We will soon have support for external tables in Apache Iceberg. Keep in mind, however, that external tables are read-only and have certain limitations. Take a look at what Snowflake did in #9 below.

8. Snowflake is broadening its abilities to manage on-premises data by partnering with storage vendors Dell Technologies and Pure Storage. The integration is anticipated to be available in a private preview in the coming weeks.

9. We are excited to announce that Snowflake now fully supports Iceberg tables, which means these tables can now support replication, time travel, and other standard table features. This enhancement will greatly improve the ease of use within a Data Lake conceptual deployment. For any further inquiries or assistance, our expert in this area is Polita Paulus.

Improved Streaming Data Pipeline Support:

10. New Streaming Data Pipelines. The main innovation is the capability to create a concept of materialized tables. Now you can ingest streaming data as row sets. Expert in this area: Tyler Akidau

Funny—I presented on Snowflake's Kafka connector at Snowflake Summit 2019. Now it feels like ancient history.

Application Development Disruption with Streamlit and Native Apps:

11. Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It's still very early but this is super interesting.

12. Native Application Framework: I've been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

Expanded SnowPark and Python Support:

13. Python Support in the Snowflake Data Cloud. More importantly, this is a major move to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for all workloads including Machine Learning. Snowflake has been making efforts to simplify the process of running data scientist workloads within its platform. This is an ongoing endeavor that aims to provide a more seamless experience.

14. Snowflake Python Worksheets. This statement is related to the previous announcement. It enables data scientists, who are used to Jupyter notebooks, to more easily work in a fully integrated environment within Snowflake.

New Workloads. Cybersecurity and OLTP! boom!

15. CYBERSECURITY. This was announced a while back, but it is being emphasized again to ensure completeness.

16. UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

Additional Improvements:

17. Snowflake Overall Data Cloud Performance Improvements. This is great, but with all the other "more transformative" announcements, I'll group this together. The performance improvements include enhancements to AWS capabilities, as well as increased power per credit through internal optimizations.

18. Large Memory Instances. They did this to handle more data science workloads, demonstrating Snowflake's ongoing commitment to meeting customers' changing needs.

19. Data Marketplace Improvements. The Marketplace is one of my favorite things about Snowflake. They mostly announced incremental changes.

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

Snowflake is positioning itself now way beyond a cloud database or data warehouse. It now is defining itself as a full-stack business solution environment capable of creating business applications.
Snowflake is emphasizing it is not just data but that it can handle “all workloads” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, and Data Applications and it now has a Native App and Streamlit Development toolset.
Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams of data pipelines from Kafka, etc., and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

Snowflake at a very high level wants to:

Disrupt Data Analytics
Disrupt Data Collaboration
Disrupt Data Application Development

Want more recap beyond just the features?

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Snowflake-related Growth Stats Summary:

Employee Growth:

2019: 938 Employees

2022 at Summit: 3992 Employees

Customer Growth:

2019: 948 Customers

2022 at Summit: 5944 Customers

Total Revenue Growth:

2019: 96M

2022 at Summit: 1.2B

Snowflake’s 7 Pillars of Innovations:

Let’s go through the 7 pillars of snowflake innovations:

All Workloads – Snowflake is heavily focusing on creating an integrated platform that can handle all types of data and workloads, including ML/AI workloads through SnowPark. Their original architecture's separation of computing and storage is still a key factor in the platform's power. This all-inclusive approach to workloads is a defining characteristic of Snowflake's current direction.
Global – Snowflake, which is based on SnowGrid, is a fully global data cloud platform. Currently, Snowflake is deployed in over 30 cloud regions across the three main cloud providers. Snowflake aims to provide a unified global experience with full replication and failover to multiple regions, thanks to its unique architecture of SnowGrid.
Self-managed – At Snowflake, we are committed to ensuring that our platform remains user-friendly and straightforward to use. This is our priority and we continue to focus on it.
Programmable – Snowflake can now be programmed using not only SQL, Javascript, Java, and Scala, but also Python and its preferred libraries. This is where Streamlit comes in.
Marketplace – Snowflake emphasizes its continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares). Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
Governed – Snowflake stated that they have a continuous heavy focus on data security and governance.
All Data – Snowflake emphasizes that it can handle not only structured and semi-structured data, but also unstructured data of any scale.

Conclusion:

We hope you found this article useful!

Today’s article recapped Snowflake Summit 2022, highlighting feature announcements and innovations. Snowflake is a full-stack business solution environment with seven pillars of innovation: all data, all workloads, global, self-managed, programmable, marketplace, and governed. We covered various topics such as cost governance, data management, external table support, and cybersecurity.

If you want more news regarding Snowflake and how to optimize your Snowflake accounts, be sure to check out our blog.

What is Snoptimizer?

Posted on August 13, 2021October 24, 2023 by Frank Bell

Part 1: What is Snoptimizer™?

Snoptimizer is an application developed by our team at ITS - Snowflake Solutions, led by our Founder, Frank Bell. Frank is undoubtedly one of the world's foremost experts in Snowflake data optimization and has leveraged his mastery of Snowflake to create this one-of-a-kind automated solution for companies who want to streamline their Snowflake usage.

Snoptimizer™ aggregates the best practices and lessons from Frank’s leading expertise in optimizing Snowflake accounts for companies like Nissan, Fox, Yahoo, and Ticketmaster.

Snoptimizer™ is the first automated cost, performance, and security optimizer application for Snowflake Accounts. It is by far the easiest and fastest way to optimize your Snowflake Account quickly and effectively. The tool service can optimize your Snowflake Account within minutes.

Part 2: Why did we build Snoptimizer?

The reason we built SnoptimizerTM is that we frequently saw Snowflake customers whose accounts were not optimized as much as possible. In nearly all cases, their Snowflake usage was highly inefficient. To address this need, we created Snoptimizer™.

All too often, we were brought in by Snowflake customers for health checks. In 98% of these cases, the customer’s accounts were not optimized as much as we could optimize them. Unfortunately, most of the time their Snowflake usage was actually highly inefficient, and the customer was not using Snowflake as effectively as possible in one or more areas.

Snoptimizer™ was built by Snowflake Data Heroes, an elite group of only 50 worldwide, consultants and product builders who use Snowflake. Our SnoptimizerTM team comprises some of the most experienced Snowflake optimization and migration experts. We have specialized in Snowflake for years, studying every aspect in depth, to provide unparalleled optimization services.

Part 3: The Problem Snoptimizer Solves:

Snowflake is an incredibly scalable and easy-to-use data platform. That said, Snowflake’s data cloud offering is constantly evolving with new features and cost-saving services. Also, the Snowflake database and data cloud concept are relatively new to many administrators and users. While the basics are easy to use compared to other options, optimizing a Snowflake account to maximize efficiency and cost savings is challenging. It requires deep understanding of hundreds of objects like warehouses, resource monitors, query history, materialized views, search optimization, Snowpipe, load history, and more.

A few common customer optimization issues we've encountered:

Poorly configured usage. All too often, we see unused consumption credits wasted due to incorrectly configured warehouses. Remember, cost-based consumption services are great until misused. An unoptimized account may experience performance or security issues. Therefore, we analyzed every area of Snowflake metadata views and developed the most advanced optimizations for cost, performance, and security beyond anything documented or available elsewhere.
Incorrect storage settings or architecture. We often find suboptimal Snowflake settings during health checks, like 10- to 90-day time travel enabled for objects that don’t need it. We also see inefficient lift-and-shift migrations that keep drop-and-recreate architectures which make no sense in Snowflake.
Inefficient warehouse setup. This is one of the first issues we typically fix, often saving our customers hundreds of dollars each day.
Accounts with Significant Cost Risks. As we stated in previous blog posts here at ITS Snowflake Solutions, Snowflake enables awesome scale but if misconfigured it also has major cost risks by default due to its consumption-based pricing, especially for compute and larger warehouses. These is the Snowflake Cost Risks we discussed previously.

Part 4: What does Snoptimizer™ do?

Snoptimizer continuously monitors your Snowflake account in three major areas: cost, security, and performance. It scans over 40 Snowflake views to detect anti-patterns that waste resources or hurt performance and security. Snoptimizer is the only service that continuously optimizes your Snowflake account to maximize efficiency and cost savings.

Let’s dive deeper into the three main areas Snoptimizer streamlines on Snowflake usage:

Cost Optimization:

The Snoptimizer Cost Optimization service regularly reviews ongoing Warehouse and Resource Monitor configurations. It promptly fixes any incorrectly configured Account settings.

The Snoptimizer service continually optimizes your Snowflake account(s) to reduce costs. It can automatically apply optimizations or require approval before changes. Snoptimizer is your best tool for minimizing Snowflake costs.

Snowflake's RDBMS and DDL/DML are easy to use, but warehouses and compute optimization are also easy to misconfigure. Snoptimizer eliminates this inefficiency and waste in Snowflake computing and storage.

Performance Optimization:

The Snoptimizer team analyzes your Snowflake query history and related data to identify warehouses that are over-provisioned or under-provisioned. We are the only service that automates Snowflake performance optimization and provides recommendations, such as:

Right-sizing warehouses to match your workload
Leveraging other Snowflake cost-saving features
Consolidating unused warehouses
Enabling Auto Clustering
Using Materialized Views
Optimizing Search

We review your Snowflake account to find ways to improve performance and lower costs. Our recommendations are tailored to your specific usage patterns and needs.

Security Optimization:

Snoptimizer is one of your best tools for improving Snowflake security. It continuously monitors your Snowflake account for risks that could compromise your data or account. Since security often depends on company culture, we provide recommendations and best practices to help prevent account breaches and data leaks. Snoptimizer Security Optimization performs frequent checks to identify misconfigurations or vulnerabilities that could be exploited.

Snoptimizer Core Features:

Analyzes Snowflake data warehouses to identify inefficient settings
Immediately limits “cost exposure” from computing resources
Reviews previous queries and usage to optimize performance and efficiency
Provides regular reports on Snowflake usage
Creates effective monitors for each warehouse’s resources
Offers recommendations and automation to optimize your setup
Incorporates Snowflake’s best practices for cost optimization, including some undocumented tips

Part 5: What results can you expect from using Snoptimizer?

On average, we've seen 10-30% cost savings, thousands of security issues fixed, and hundreds of performance problems solved in our tests.
In some implementations, we've achieved up to 43% cost savings.

Part 6: Try Snoptimizer™ today!

Try Snoptimizer today. Sign up and schedule a personal demo with us!

Visit our Website to Explore Snoptimizer

To Recap... How Snoptimizer Helps You

Snoptimizer quickly and automatically optimizes your Snowflake account for security, cost, and performance.
It eliminates headaches and concerns about security risks and cost overruns across your Snowflake account.
It prevents you from making costly mistakes.

In short, Snoptimizer makes managing your Snowflake cost, performance, and security much easier and more automated.
Optimization in a few hours, hassle-free. Get optimized today!

Part 7: Conclusion

The Snowflake Data Cloud continues expanding, and though easy to use, optimizing for cost, security, and performance remains challenging. Snoptimizer makes optimization effortless and affordable, saving you from cost overruns and security issues.

We’d love to help streamline your Snowflake use, optimize data cloud costs, and leverage this tech to boost your business as much as possible.

Snowflake Cost Risks

Posted on August 12, 2021April 14, 2023 by Frank Bell

Introduction:

Today’s article discusses the cost risks associated with using the Snowflake Data Cloud. It emphasizes the importance of proper administration and resource monitoring to mitigate these risks. We will also mention our service, Snoptimizer, which can help automate cost optimization and risk minimization related to Snowflake accounts.

Part 1: My Experience with Snowflake’s Data Cloud

As a Snowflake Data Superhero for the past 4 years, I can’t deny, I love using Snowflake. I fell in love with it at the beginning of 2018 when I realized how easily I could execute all of our Big Data Consulting Practice Solutions we had been doing for 18+ years. In the past, we would often run into scale challenges as the data’s size grew but Snowflake brought both ease of use and amazing scale to almost all of my big data consulting projects.

Over the last three years, my team and I have worked on hundreds of Snowflake accounts. I’ve come to realize that if Snowflake Anti-patterns occur or if poor compute security practices are used, Snowflake accounts are exposed to large cost risks, particularly with regard to computing costs. While Snowflake is an amazingly scalable cloud database and is the best cloud data warehouse I’ve used in the last 3+ years, the deployment of a Snowflake Account without proper settings and administration exposes a company to these cost risks.

Part 2: Examples of Data Cloud Cost Risks

Let’s say you actually used the Classic Console to create a new warehouse and used all the default settings… Even if you didn’t run any query, the cost for the standard settings would be 10 minutes * XL Warehouse (16 credits/hour) @ $3/credit. It is only $8 for those 10 minutes but it was $8 spent on nothing. Let’s say you had a rogue (or curious) trainee on a Snowflake Account that didn’t understand what they were doing and does the same thing they ONLY change the size to a 6XL. Your 10-minute run-for-nothing cost exposure is 10 min * 6XL Warehouse (512 credits/hour) @ $3/credit. Your account just spent $256 for 10 minutes of nothing.

Snowflake Cost Risk Use Case 6XL – 1 cluster:

Cost per hour @ $3/credit = 3 * 512 = $1536Cost per day @ $3/credit = $36,864

Snowflake Cost Risk Use Case 6XL – 5 clusters:

*[we know this is a worst-case scenario on AWS/Snowflake and this one would be rare BUT without resource monitors and correct permissions it is possible]*Cost per hour @ $3/credit = 3 * 512 = $7680Cost per day @ $3/credit = $184,320

Snowflake Cost Risk Use Case 6XL – 10 clusters:

As you can see, this is unreasonable exposure to cost risks. If you are a Snowflake administrator, make sure you make appropriate changes to control costs and cost risk. If you want an automated approach to Cost Risk Management that can be set up easily in a few hours then try our Snoptimizer Cost Risk Solution.

Snowflake Cost Risk Mitigation – Administration – ACCOUNT ADMIN – MUST DO

The most important way to minimize Snowflake Cost Risk is to create Resource Monitors for every warehouse that has to suspend actions. Here is the code to do that: [replace 150 with your daily credit limit].

CREATE RESOURCE MONITOR "REMOVE_SNOWFLAKE_COST_RISK_EXAMPLE_RM" WITH CREDIT_QUOTA = 50
FREQUENCY = 'DAILY', START_TIMESTAMP = 'IMMEDIATELY', END_TIMESTAMP = NULL
TRIGGERS
ON 95 PERCENT DO SUSPEND
ON 100 PERCENT DO SUSPEND_IMMEDIATE
ON 80 PERCENT DO NOTIFY;

ALTER WAREHOUSE “TEST_WH” SET RESOURCE_MONITOR = “REMOVE_SNOWFLAKE_COST_RISK_EXAMPLE_RM”;

Part 3: Use Snoptimizer

Snoptimizer offers Snowflake users a unique solution to enhance the cost, performance, and security of their Snowflake accounts.

It provides you with a consistent and reliable daily analysis of how your Snowflake account is being optimized and rightsized to achieve maximum efficiency.

Try Snoptimizer today. Sign up and schedule a personal demo with us!

Conclusion:

To conclude, we now understand that Snowflake's Data Cloud Cost Risk is a real issue that needs proper administration. While the Snowflake Data Cloud offers immense scale and power for any analytical data processing need, it must be optimized and continuously monitored by a service like Snoptimizer. Remember, with great data processing power comes great cost management responsibility. If an administrator mistakenly grants access to an untrained user to create a 6XL instance that they don't need and isn't within the business budget, it can result in significant costs.

Try Snoptimizer today to avoid a data-driven cost catastrophe. If a new warehouse appears, we have you covered!

Snowflake Cost Guardrails – Resource Monitors

Posted on August 7, 2021April 18, 2023 by Frank Bell

Introduction:

The Snowflake Data Cloud provides impressive scalability and processing power for analytical data. It's an incredible advancement that now allows for launching T-shirt warehouse sizes ranging from XS (1 virtual instance) up to 6XL with 512 EC2 instances when running on AWS on 1 cluster. However, it's important to set up your Snowflake account with resource monitors to control costs. Snowflake Resource Monitors serve as your primary way to control costs within the Snowflake Data Cloud.

Let’s show you how easy it is to set up your Snowflake cost guardrails so your costs don’t go beyond what you expect.

We recommend either hiring a full or part-time Snowflake administrator focused on cost optimization and database organization or using our Snowflake Cost Optimization Tool – Snoptimizer. Snoptimizer automates setting up resource monitors on your Snowflake account for each warehouse and tons of other cost optimizations and controls on your Snowflake Account. Let’s dig into the only true Snowflake Cost Risk Guardrails you have had for a while, Resource Monitors.

Resource Monitors – Your Snowflake Data Cloud Cost Guardrails

Resource Monitors are technically relatively easy to set up from the Snowflake Web GUI or command line. Even though setting up one Resource Monitor is relatively easy, it’s still easy to make incorrect assumptions and not have enough effective monitoring and suspending in place. It is like having your guardrails not installed yet if you do not do the following which is too easy to do:

Finding out that Snowflake consumption-based pricing was so reasonable was game-changing for me and my consulting company. We could finally provide scale to any analytical challenge and solution we needed to create. This was never possible before. I remember building predictive marketing tools and we often had to crunch large data sets we would often run into scaling challenges and have to spend tons of time and engineering effort to engineer for scale.

Try Snoptimizer:

Try Snoptimizer today. Sign up and schedule a personal demo with us!

To Recap - How Snoptimizer Helps You

Snoptimizer quickly and automatically optimizes your Snowflake account for security, cost, and performance. It eliminates headaches and concerns about security risks and cost overruns across your Snowflake account.

In short, Snoptimizer makes managing your Snowflake cost, performance, and security much easier and more automated.

Conclusion:

Having properly set up Snowflake Guardrails in the form of Resource Monitors is extremely important. If you're unsure whether or not you have these in place, it's time to take action. Activate Snoptimizer today to optimize your system in just a few hours, and ensure continuous and regular cost optimization monitoring. If a new warehouse appears, we've got you covered!

In conclusion, setting up Snowflake resource monitors is crucial for controlling costs in the Snowflake Data Cloud.

Snowflake Cost Optimization Best Practices

Posted on July 27, 2021April 18, 2023 by Frank Bell

Introduction:

I have been working with Snowflake since the beginning of 2018 and it has been one of the most enjoyable and scalable data solutions I have encountered in my 27+ year career as a data engineer, data architect, data entrepreneur, and data thought leader. It is an extremely powerful platform (with nearly unlimited scalability, limited only by Snowflake's allocation of Compute within an Availability Zone) that requires responsible usage.

In the past 3 years, I've analyzed more than 100 Snowflake accounts and found that about 90% of them were not completely optimized for cloud data costs. That's why my team and I are thrilled to introduce Snoptimizer, the first automated Snowflake cost optimization service.

One of the reasons why 90% of those accounts did not have resource monitors or regular optimizations is that Snowflake is initially cost-effective and typically provides significant savings, especially for on-prem migrations that we have completed. However, companies that do not optimize their Data Cloud Costs are missing out on big opportunities! That's why we created Snoptimizer, and I'm also sharing my top 6 Snowflake cost and risk optimizations below. Hope you find them helpful!

Part 1: My Best Practices for Optimizing Snowflake Costs and Reducing Cost Risks:

Best Practice #1 – Resource Monitors.

One of the initial features of Snoptimizer is the automation of daily Resource Monitors at a warehouse level, which is based on the Snowflake Metadata database history and warehouse and Resource Monitor settings. This is set up immediately following the purchase of Snoptimizer.

By doing this, both cost risk and guardrails for all warehouse computing are reduced. Snoptimizer utilizes Snowflake Metadata and warehouse/Resource Monitor settings to automatically monitor resources daily at the warehouse level. This helps limit risks and ensures that constraints are not exceeded.

Best Practice #2 – Auto Suspend Setting Optimization.

Snoptimizer automates another optimization by analyzing the workloads in the Warehouse and making changes to the Auto Suspend settings. Depending on the workload, Snoptimizer can also automate additional cost savings for you.

Best Practice #3 – Monitoring Cloud Services Consumption Optimization

Snoptimizer analyzes your Snowflake Account's Cloud Services consumption to quickly identify opportunities for cost savings. We thoroughly review usage and billing details for each service to ensure that only what is necessary is provisioned, reducing waste and minimizing costs. Optimizing Cloud Services is one of the most effective ways to lower your Snowflake spending while still meeting your data and compute demands.

Best Practice #4 – Regular Monitoring of Storage Usage Across your Entire Snowflake Account

At Snoptimizer, our goal is to help you save on Storage costs. We start by reviewing your Storage History for the past 60 days to identify any settings that may be causing you to overpay. We commonly find that charges related to Time Travel and/or Snowflake Stages are unnecessary and can be avoided.

At Snoptimizer, we can help you make the most of your storage space. Our service optimizes your settings based on your actual usage, so you're only paying for what you need. By analyzing 60 days of storage history, we're able to find ways to reduce costs by up to 25%, all without sacrificing performance or features.

Best Practice #5 – Daily Monitoring of Warehouse Compute.

Besides just adding Resource Monitors that suspend Warehouses we also provide daily monitoring of Snowflake Warehouse consumption reporting daily spikes and anomalies or changes in rolling averages. Most accounts we come across do not have regular monitoring of warehouse usage on regular proactive settings.

Best Practice #6 – Regular Monitoring of New Snowflake Services.

Besides monitoring compute warehouses, Snoptimizer also immediately starts monitoring consumption on all existing and new Cloud Services (private preview and after) incurring costs from Automatic Clustering to Search Optimization to Materialized Views and all existing and new costs. This huge benefit automates Snoptimizer. We are ALWAYS optimizing your cost consumption and reducing cost risk! We are always there for you!

Part 2: Try Snoptimizer today

At Snoptimizer, we offer Snowflake users a one-of-a-kind tool to enhance their Snowflake accounts' cost efficiency, performance, and security.

Our tool offers a dependable and regular daily analysis of your Snowflake account optimization and rightsizing, ensuring maximum efficiency.

Try Snoptimizer today. Sign up and schedule a personal demo with us!

Conclusion:

After reading this article, we hope you have a better understanding of the best practices for optimizing Snowflake costs and reducing cost risks. These practices include using resource monitors, optimizing auto-suspend settings, monitoring cloud service consumption, regularly monitoring storage usage, daily monitoring of warehouse computing, and keeping track of new Snowflake services.

We encourage you to try out our automated Snowflake optimization tool for better cost, security, and performance efficiency.

Try Snoptimizer today! Sign up and schedule a personal demo with us!

Snowflake Create Warehouse Defaults

Posted on July 26, 2021April 19, 2023 by Frank Bell

Overview:

I have been working with the Snowflake Data Cloud since it was just an Analytical RDBMS. Since the beginning of 2018, Snowflake has been pretty fun to work with as a data professional and data entrepreneur. It allows data professionals amazing flexible data processing power in the cloud. The key to a successful Snowflake deployment is setting up security and account optimizations correctly from the beginning. In this article, we will discuss the 'CREATE WAREHOUSE' default settings.

Snowflake Cost and Workload Optimization is the Key

After analyzing hundreds of Snowflake customer accounts, we found key processes to optimize Snowflake for computing and storage costs. The best way to successfully deploy Snowflake is to ensure you set it up for cost and workload optimization.

The Snowflake default "create warehouse" settings are not optimized to limit costs. That is why we built our Snoptimizer service (Snowflake Cost Optimization Service) to automatically and easily optimize your Snowflake Account(s). There is no other way to continuously optimize queries and costs so your Snowflake Cloud Data solution runs as efficiently as possible.

Let's quickly review how Snowflake Accounts' default settings are currently set.

Here is the default screen that comes up when I click +Warehouse in the Classic Console.

Create Warehouse-Default Options for the Classic Console

Okay, for those already in Snowsight (aka Preview App), here is the default screen within Snowsight (or Preview App) - It is nearly identical.

Create Warehouse Default Options for Snowsight

So let's dig into the default settings for these Web UIs that will be there if you just choose a name and click "Create Warehouse" - Let's further evaluate what happens with our Snowflake Compute if you leave the default Create Warehouse settings.

These default settings will establish the initial configuration for our Snowflake Compute. By understanding the defaults, we can determine if any changes are needed to optimize performance, security, cost, or other factors that are important for our specific use case. The defaults are designed to work out of the box for most general-purpose workloads but rarely meet every need.

Create Warehouse - Default Setting #1

Size (Really Warehouse of Compute Size): X-Large is set. I assume you understand how Snowflake Compute works and know the Snowflake Warehouse T-Shirt Sizes. Notice that the default setting is X-Large Warehouse vs smaller Warehouse settings of (XS, S, M, L) T-shirt default setting. This defaults to the same setting for both the Classic Console and Snowsight (the Preview App).

Create Warehouse - Default Setting #2

Maximum Clusters: 2

While enabling clustering by default makes sense if you want it enabled, it still has significant cost implications. It assumes the data cloud customer wants to launch a second cluster and pay more for it on this Snowflake warehouse if it has a certain level of queued statements. Sticking with the XL settings - duplicating a cluster has serious cost consequences of $X/hr.

This setting only applies to the Classic Console. It also is only set if you have Enterprise Edition or higher since Standard Edition does not offer Clustering.

Create Warehouse - Default Setting #3

Minimum Clusters: 1

This is only the default setting for the Classic Console.

Create Warehouse - Default Setting #4

Scaling Policy: Standard This setting is hard to rate but the truth is if you are a cost-conscious customer you would want to change this to "Economy" by default and not have it set as "Standard". The optimal level though is that your 2nd cluster which is set by Default will kick in as soon as Queuing happens on your Snowflake warehouse versus not launching a 2nd cluster until Snowflake thinks that it has a minimum of 6 minutes of work that 2nd cluster would have to perform.

This is only the default setting for the Classic Console but when you toggle the "Multi-cluster Warehouse" on Snowsight setting this defaults to "Standard" vs. defaulting to "Economy".

Create Warehouse - Default Setting #5

Auto Suspend: 10 minutes For many warehouses, especially ELT/ETL warehouses, this default setting is typically too high. Loading warehouses that run on regular intervals rarely need such a high cache setting. For example, a loading warehouse that runs on a schedule never needs extensive caching. Our Snoptimizer service finds inefficient and potentially costly settings like this.

For a loading warehouse, Snoptimizer immediately saves 599 seconds of computing time for every interval. As discussed in the Snowflake Warehouse Best Practice Auto Suspend article, this can significantly reduce costs, especially for larger load warehouses.

We talk more about optimizing warehouse settings in this article but reducing this setting can substantially lower expenses with no impact on performance.

NOTE: This defaults to the same setting for both the Classic Console and Snowsight (the Preview App).

Snowflake Create Warehouse - Default Setting #6

Auto Resume Checkbox: Checked by Default. This setting is fine as is. I do not recall the last time I created a warehouse without "Auto Resume" checked by default. Snowflake's ability to resume a query in milliseconds or seconds once executed brings automated warehouse computing to user needs. This is revolutionary and useful!

NOTE: This defaults to the same for both the Classic Console and Snowsight (the Preview App).

Snowflake Create Warehouse - Default Setting #7

Click "Create Warehouse": The Snowflake Warehouse is immediately started. This setting I do not prefer. I do not think it should immediately start to consume credits and go into the Running state. It is too easy for a new SYSADMIN to start a warehouse they do not need. The default setting before this is already set to "Resume". The Snowflake Warehouse will already resume when a job is sent to it so there is no need to automatically start.

NOTE: This defaults to the same execution for both the Classic Console and Snowsight (the Preview App).

One last thing...

As an extra bonus, check the code below in SQL code for those of you who just do not do "GUI".

Let's go to the Snowflake CREATE WAREHOUSE code to see what is happening...

DEFAULT SETTINGS:

CREATE WAREHOUSE XLARGE_BY_DEFAULT WITH WAREHOUSE_SIZE = 'XLARGE' WAREHOUSE_TYPE = 'STANDARD' AUTO_SUSPEND = 600 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD' COMMENT = 'This sucker will consume a lot of credits fast';

Conclusion:

Snowflake default warehouse settings are not optimized for cost and workload. The default settings establish an X-Large warehouse, allow up to 2 clusters which increases costs, use a "Standard" scaling policy and 10-minute auto-suspend, and immediately start the warehouse upon creation. These defaults work for general use but rarely meet specific needs. Optimizing settings can significantly reduce costs with no impact on performance.

How Snowflake Pricing Works

Posted on January 7, 2021April 19, 2023 by Frank Bell

Introduction:

Snowflake pricing is determined by how much you use compute resources such as warehouses (virtual compute instances) and storage, as well as other costs like cloud services, cloud storage, and data transfer. Most of your Snowflake costs will be for computing resources, which typically account for 90% or more of your monthly costs. With Snowflake, you don't have to make any upfront commitments or sign any long-term contracts.

You can start with a free trial account, and you won't be charged if you don't use any billable services. You only pay for what you use. Snowflake pricing may vary depending on the platform and region you're using.

Which tools can help optimize your Snowflake costs?

There are a multitude of tools that you can leverage to confidently optimize and minimize your Snowflake costs. One such tool is Snoptimizer, which can be a game-changer for your organization.

Snoptimizer is the first automated Snowflake Cost Optimization Service that ensures significant cost savings (up to 50% on Snowflake compute) without sacrificing performance.

We built Snoptimizer because we saw a significant need in the marketplace. We were often called in by Snowflake customers for Snowflake Health checks, and 98% of the time, their accounts were not fully optimized.

Snoptimizer runs regularly and scours your Snowflake Operations Account Meta Data (over 40 views) continuously looking for Snowflake storage and computing anti-patterns and inefficiencies related to cost.

Usage-Based Pricing:

Usage Based Pricing in cloud services, especially in Snowflake, can be incredibly awesome sometimes. The fact that we can even start an account off with 400 credits for 30 days for a Proof of Concept (POC) is just amazing to me. Before this, our consulting company hesitated to introduce these more expensive solutions to our consulting clients which were small or medium size businesses because these solutions were out of their pricing comfort zone (especially when working with analytical databases that could scale like Exadata, Teradata, and Netezza).

What is the pricing on Snowflake?

For those of you who are new to Snowflake, let's start with Snowflake consumption pricing basics. Snowflake overall is usage or consumption-based pricing. This means you only pay for what you use. Technically, you could set up a free Snowflake Trial Account and never pay anything because you never used any of the services that have a cost. T

For most Snowflake Accounts, Snowflake Compute or the Snowflake Warehouses (which are virtual compute engines) is where 90% or more of your costs are. The other four cost areas of Storage Costs, Cloud Computing Costs, Cloud Services Costs, and Data Transfer costs are typically easily 10% or less of the Snowflake SaaS costs per month. Often the others can even be 1% or less unless you have certain use cases or end up mistakenly using Snowflake Cost Anti-patterns.

Please keep in mind that as soon as your Snowflake Account is provisioned, you the administrator, or a person with their credit card associated with the account have extreme cost risk by default. Our best practice is to always enable Snowflake Cost Optimization with Snoptimizer immediately after provisioning a Snowflake Account. If you decide against that then at the very least you should limit access or set up standard Snowflake Cost Minimization Guardrails and Snowflake Cost Optimization and Cost Minimization Best Practices.

For those of you who are more Snowflake savvy and already know the basics then let's cover more advanced Snowflake pricing details.

Snowflake Compute Pricing - Advanced

One of the first things that Snoptimizer does is automate daily Resource Monitors at a warehouse level based on all the Snowflake Metadata Database history and warehouse and Resource Monitor settings. This gets set almost immediately after you purchase Snoptimizer. This has both huge cost risk reduction limits and guardrails for all of your warehouse compute.

One cool thing you can do is reduce your default query time out to 4 hours or less instead of 2 days by default with the following code.

ALTER WAREHOUSE SET STATEMENT_TIMEOUT_IN_SECONDS = 14400;

[/signinlocker]

How to Optimize Your Costs?

Over the last 3 years, my teams and I have analyzed over 100 Snowflake accounts, and about 95% of them were not fully optimized for both Cloud data costs and Cloud cost risk minimization. This is why my team and I are so excited to have created Snoptimizer (the first automated Snowflake Cost Optimization Service) - Easily optimize your Snowflake Data Cloud Account here in a few hours.

I think the reason why 90% of those accounts didn't have resource monitors or regular optimizations in place was initially Snowflake is incredibly cost-effective and typically had massive savings, especially from on-prem migrations that we have done. However, companies that do not optimize their Data Cloud Costs are missing out big time!

Try Snoptimizer today:

Snoptimizer quickly and automatically optimizes your Snowflake account for security, cost, and performance. It eliminates headaches and concerns about security risks and cost overruns across your Snowflake account.

Try Snoptimizer today. Sign up and schedule a personal demo with us!

Optimization in a few hours, hassle-free!

Conclusion:

I hope the Snowflake Basic and Advanced Pricing information above is useful to you on your Snowflake Journey. For me, finding out that Snowflake consumption-based pricing was so reasonable was game-changing for both myself and my consulting company. Before Snowflake, we couldn't provide compute scale with enough speed to many of the largest big analytical challenges and solutions our clients needed.

I remember building predictive marketing tools and we often had to crunch large data sets we would often run into scaling challenges and have to spend tons of time and engineering effort to engineer for scale. Keep in mind that if you don't use Snowflake's Services smartly, you can end up spending a lot of money. Therefore, we recommend using Snoptimizer to help you reduce your costs.

If you're looking to optimize your Snowflake account costs, try Snoptimizer today!

Snowflake Cost Anti-patterns

Posted on February 8, 2020April 11, 2023 by Frank Bell

Snowflake is still my favorite Analytical Database since the beginning of 2018 but as I often present in my live training sessions and webinars, WITH GREAT POWER (practically unlimited computing scale) comes GREAT RESPONSIBILITY.

In this article, I'll cover the TOP 3 Snowflake Cost Anti-patterns my Snowflake Cost Optimization team and I have come across after 3 years of analyzing hundreds of Snowflake Accounts. I cannot begin to state how you should either invest in a PART or FULL-TIME Snowflake DBA focused on cost and organization or if you do not have that financial luxury then use our automated Snowflake Cost Optimization Service - Snoptimizer. It's incredibly easy and low-cost to set up Snoptimizer compared to having these anti-patterns manifest (which I know happens on too many Snowflake Accounts based on our review of hundreds of them). If you do not have cost guardrails like Resource Monitors enabled your Snowflake Compute Consumption Risk is high and it's honestly gross negligence as a data administration professional to allow this.

Let's go through the TOP 3 Snowflake Cost Anti-patterns.

Top 3 Anti-Patterns

The first Snowflake anti-pattern is by far the worst and happens all too often.

Snowflake Cost Anti-pattern #1

Sadly, we all too often see that Resource Monitors are not set up correctly. Some Snowflake accounts have them set up but do not have them set up at an effective grain. One anti-pattern is that the administrator sets a large credit-sized resource manager for the overall account and no other resources managers. It is okay to have some Resource Monitors cover the account or multiple warehouses but we highly recommend having 1 Resource Monitor set for Daily monitoring for each warehouse with auto-suspend enabled once a credit limit is reached. This is currently the only real solution to having guardrails on your Snowflake consumption. Without doing this you are exposing your company and Snowflake account to significant cost risk.

An additional anti-pattern related to Resource Monitors that we see too frequently is Administrators do not want to be responsible for stopping the computer so they set up Resource Monitors with ONLY notifications. The problem with this is that notifications are just that...ONLY something to notify you. What if you only have 1-2 Snowflake Account Administrators and they are not monitoring the emails or web notifications frequently enough and a Large to 6XL warehouse comes online without auto suspend enabled?

Another problem as well is that Snowflake Administrators set up Resource monitors BUT do not attach them to a warehouse. This is the same as having a guardrail but it's not activated. Ugh!

Finally, we also see Resource Monitors get set up but then Account Administrators who do not enable their email or notifications correctly.

Snowflake Cost Anti-pattern #2

Another major Snowflake Cost anti-pattern is related to storage. We do not see this nearly as often as #1 but it can also be a cost risk danger if you do not understand the impacts of enabling longer Time-Travel settings on Snowflake. If you have many of your tables with time-travel set to 30,60, or 90 days, but you don't need that much time travel and will never use it, then you should change those configurations to lower time-travel settings.

There are similar potential problems with any table that is frequently updating and changing data. These types of data tables will challenge Snowflake's architecture because every data change requires recreating micro-partitions. So if you have a 90-day time-travel set and you are changing a table with large amounts of rows/sizes every few minutes or hours then it's going to add up as all of those immutable micro-partitions for every change are saved for 90 days. Also, remember by default Snowflake forces a 7-day fail-safe of storage. So if you have Time-Travel set to 90 days then it's 97 days of storage you will pay for.

Snowflake Cost Anti-pattern #3

Setting a Warehouse to "Never" auto suspend or high auto suspend settings. If you set a Warehouse to never suspend then you are creating a never-ending spend on a warehouse until you manually or through code suspend it. If the warehouse size is only XS then this isn't incredibly horrible but if it's a larger size the costs can grow very fast and you lose all of the value of Snowflake's consumption-based pricing.

Conclusion for Snowflake Cost Anti-patterns:

These are the top 3 most dangerous Snowflake Cost Anti-patterns we have come across. There are many others but they are typically not as severe as these. These Snowflake Cost Anti-patterns are real and introduce your company and yourself to sizable cost risks. This is why we recommend using Snoptimizer or enabling Snowflake Best Practice Cost Optimization by your team. Especially set up Resource Monitors IMMEDIATELY or at least in the same data as your Snowflake Account is provisioned.

Snowflake Solutions Expertise and Community Trusted By

Enter Your Email Address Here To Join Our Snowflake Solutions Community For Free

Introduction:

Snowflake Summit 2023 Recap from a Snowflake Data Superhero:

Top Announcements:

1.Native Applications goes to Public Preview:

2. Nvidia/Snowflake Partnership, Support of LLMs, and Snowpark Container Services (Private Preview):

3. Dynamic Tables (Public Preview):

4. Managed Iceberg Tables (Private Preview):

5. Snowpipe Streaming API (Public Preview):

Top Cost Governance and Control Changes:

1. Budgets (Public Preview):

2. Warehouse Utilization (Private Preview):

My takeaways from Snowflake Summit 2023:

Marketplace Capacity Drawdown Program

What did happen to Unistore?

Application Development Disruption with Streamlit and Native Apps:

Introduction:

Data to Value Trends - Part 2:

8) - Growth of Fivetran and now Hightouch.

9) - Resistance to “ease of use” and “cost reductions” is futile.

10) - Growth of Automated and Integrated Machine Learning Pipelines with Data.

Snowflake's Announcements related to Data to Value

Snowflake recently announced its support of Hybrid Tables and the concept of Unistore.

Announcements about Snowflake's Native Apps:

Conclusion:

Snowflake Snowday — Summary

Snowflake Snowday Summary - Top Announcements

TOP announcement! – whoop whoop – SNOWPARK FOR PYTHON! (General Availability – GA)

One Product. One Platform.

The Cross-Cloud Snowgrid:

Replication Improvements and Snowgrid Updates:

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

Data Listings and Cross-Cloud Updates

Private Listings (Get a crisper-looking visual)

Snowflake Data Governance Improvements

Streaming and Dynamic Table Announcements:

Conclusion:

Introduction:

Data to Value Trends:

1) – Non-stop push for faster speed of Data to Value.

2) – Data Sharing.

3) – Creating Data with the End in Mind.

4) – Automated Data Applications.

5) – Full Automated Data Copying Tools.

6) – Full Automation of Data Pipelines and more integrated ML and Data Pipelines.

7) – The Data Mesh Concept(s) and Data Observability.

Conclusion:

Introduction: What is a Data Clean Room?

Part 1: Data Clean Room Example Use Cases

Media/Advertising:

Healthcare and Life Sciences:

Financial Services:

Retail:

Government:

Part 2: Looking for more information about Data Clean Rooms?

Introduction: What is Snowflake’s Cost Governance?

Part 1: My Take on Snowflake’s Cost Governance

Before Snowflake Summit 2022, Cost Governance in Snowflake was honestly pretty weak. It only had the following GUI and optimization tools:

After Snowflake Summit 2022, these major Cost Governance announcements were provided:

Conclusion:

What is a Snowflake Data Superhero?

The Snowflake Data Superhero Program (Before Fall 2021)

The Snowflake Data Superhero Program (2022)

How does the Snowflake Corporation choose Snowflake Data Superheroes?

What benefits do you get when you become a Snowflake Data Superhero?

Snowflake Data Superhero Benefits:

The Snowflake Data Superhero Program (2023)

Snowflake's Data Superhero Program Evolution

Snowflake's Data Superhero Program Internal Team

Other Snowflake Data Superhero Questions:

Summary

Introduction:

Snowflake Summit 2022 Recap from a Snowflake Data Superhero:

Cost Governance:

Replication Improvements on SnowGrid:

Data Management and Governance Improvements:

Expanding External Table Support and Native Iceberg Tables:

Improved Streaming Data Pipeline Support:

Snowflake Solutions Expertise and
Community Trusted By