Introduction:
Similar to last year, I wanted to create a “shortest” recap of the Snowflake Summit 2023, including the key feature announcements and innovations. This is exactly 2 weeks after Snowflake Summit has ended and I have digested the major changes. Throughout July and August we will follow up with our view of the massive Data to Value improvements and capabilities being made.
Snowflake Summit 2023 Recap from a Snowflake Data Superhero:
If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.
Top Announcements:
1.Native Applications goes to Public Preview:
I am slightly biased here because my teams have been working with Snowflake Native Apps since Feb/March 2022. We have been on the journey with Snowflake from early early Private Preview to now over the last 16 months or so. We are super excited about the possibilities and potential of where this will go.
2. Nvidia/Snowflake Partnership, Support of LLMs, and Snowpark Container Services (Private Preview):
Nvidia and Snowflake are teaming up (because as Frank S. says… some people are trying to kill Snowflake Corp) and they will integrate Nvidia’s LLM framework into Snowflake. I’m also really looking forward to seeing how these Snowpark Container Services work.
3. Dynamic Tables (Public Preview):
Many Snowflake customers including myself are really excited about this. These allow new Data Set related key features beyond a similar concept like Materialized Views. With Dynamic Tables you can… have declarative data pipelines, dynamic SQL Support , user defined low latency freshness, automated incremental refreshes, and snapshot isolation.
4. Managed Iceberg Tables (Private Preview):
“Managed iceberg tables” allows Snowflake Compute Resources to manage Iceberg data. This really helps with easier management of iceberg format data and helps Snowflake compete for Data Lake or really large Data File type workloads. So Snowflake customers can manage their data lake catalog with Iceberg BUT still get huge value with better compute performance with Snowflake’s query engine reading the metadata that Iceberg provides. In some ways this is a huge large file data to value play. It enables what blob storage (S3, Azure, etc.) do best BUT then being able to utilize Snowflake’s compute means less DATA TRANSFORMATION and faster value from the data including dealing with standard data modifications like updates, deletes and inserts.
5. Snowpipe Streaming API (Public Preview):
As someone that worked with and presented on the Kafka Streaming Connector back at Summit 2019 it is really great to see this advancement. Back then the connector was “ok”. It could handle certain levels of streaming workloads. 4 years later this streaming workload processing has gotten much much better.
Top Cost Governance and Control Changes:
As anyone who has read my blog over the past few years, I’m a huge advocate of the Snowflake pay for what you use is AWESOME but ONLY when tools like our Snoptimizer® Optimization tool is used or you really really setup all the cost guard rails correctly. 98% of accounts we help with Snoptimizer do not have all the optimizations set correctly. Without continuous monitoring of costs (and for that matter performance and security – which we also offer unlike a lot of the other copycats).
1. Budgets (Public Preview):
This “budget” cost control feature was actually announced back in June 2022. We have been waiting for it for some time now. It is good to see Snowflake finally delivering this functionality. Since we started as one of the top Snowflake Systems Integrators back in 2018 there has been ONLY Resource Monitors to have ANY control whatsoever with guardrail limit type functionality. This has been a huge pain point for many customers for many years. Now, with this budget feature, users can actually specify a budget and get much more granular details about their spending limits.
2. Warehouse Utilization (Private Preview):
This is another great step forward for Snowflake customers looking to optimize their Snowflake warehouse utilization. We already leverage meta data statistics that are available to do this within Snoptimizer® but we are limited by the level of detail we can gather. This will allow us to optimize workloads much better across Warehouses to get even higher Snowflake Cost Optimization for our customers.
My takeaways from Snowflake Summit 2023:
- If you would like more content and my summaries are not enough details then you are in luck. Here are more details from my team on our top findings around Snowflake Summit 2023.
- Snowpark Container Services allow Snowflake customers to now run any job, function or service — from 3rd party LLMs, to Hex Notebook to a C++ application to even a full database, like Pinecone, in users’ own accounts. It now supports GPUs.
- Streamlit is getting a new faster and easier user interface to develop apps. It is an open-source Python-based framework compatible with major libraries like sci-kit-learn, PyTorch, and Pandas. It has Git integration for branching, merging, and version control.
- Snowflake is leveraging two of its recent acquisitions — Applica and Neeva to provide a new Generative AI experience. The former acquisition has led to Document AI, an LLM that extracts contextual entities from unstructured data and queries unstructured data using natural language. The unstructured to structured data is persisted in Snowflake and vectorized. Not only can this data be queried in natural language, but it can also be used to retrain the LLM on private enterprise data. While most vendors are pursuing prompt engineering. Snowflake is following the retraining path.
- Snowflake now provides full MLOps capabilities, including Model Registry, where models can be stored, version controlled, and deployed. They are also adding a feature store with compatibility with open-source Feast. It is also building LangChain integration.
- Last year, Snowflake added support for Iceberg Tables. This year it brings the tables under its security, governance, and query optimizer umbrella. Iceberg table’s performance now matches the tables’ query latency in native format.
- Snowflake is addressing the criticism of its high cost through several initiatives designed to make costs predictable and transparent. Snowflake Performance Index (SPI) — using ML functions, it analyzes query durations for stable workloads and automatically optimizes them. This has led to 15% improvement on customers’ usage costs.
- Snowflake has invested hugely in building native data quality capabilities within its platform. Users can define quality check metrics to profile data and gather statistics on column value distributions, null values, etc. These metrics are written to time-series tables which helps build thresholds and detects anomalies from regular patterns.
- Snowflake announced two new APIs to support the ML lifecycle:
- ML Modeling API: The ML Modeling API includes interfaces for preprocessing data and training models. It is built on top of popular libraries like Scikit Learn and XGBoost, but seamlessly parallelizes data operations to run in a distributed manner on Snowpark. This means that data scientists can scale their modeling efforts beyond what they could fit in memory on a conventional compute instance.
- MLOps API: The MLOps API is built to help streamline model deployments. The first release of the MLOps API includes a Model Registry to help track and version models as they are developed and promoted to production.
- Improved Apache Iceberg integrations
- GIT Integration: Native git integration to view, run, edit, and collaborate within Snowflake code that exists in git repos. Delivers seamless version control, CI/CD workflows, and better testing controls for pipelines, ML models, and applications.
- Top-K Pruning Queries: Enable you to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, help reduce the need to scan across entire data sets, thereby enabling faster searches. (SELECT ..FROM ..TABLE ORDER BY ABC LIMIT 10).
- Warehouse Utilization: A single metric that gives customers visibility into actual warehouse utilization and can show idle capacity. This will help you better estimate the capacity and size of warehouses.
- Geospatial Features: Geometry Data Type, switch spatial system using ST_Transformation, Invalid shape detection, many new functions for Geometry and Geography
- Dynamic Tables
- Amazon S3-compatible Storage
- Passing References for Tables, Views, Functions, and Queries to a Stored Procedure — Preview
Marketplace Capacity Drawdown Program
Anomaly Detection: Flags metric values that differ from typical expectations.
Contribution Explorer: Helps you find dimensions and values that affect the metric in surprising ways.
What did happen to Unistore?
UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features: This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.
This was from last year too – it’s great to see this move forward! (even though..Streamlit speed is still a work in progress)
Application Development Disruption with Streamlit and Native Apps:
Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.
Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.
Snowflake at a very high level (still) wants to:
Disrupt Data Analytics
Disrupt Data Collaboration
Disrupt Data Application Development