Snowflake Snowday – Data to Value Superhero Summary

Snowflake Snowday — Data to Value and Superhero Summary

Snowflake Snowday is Snowflake’s semi-annual product announcement. This year it was held on 2022–11–07 which was the same day as the end of the Snowflake Data Cloud World Tour (DCWT) which was a live event in San Francisco.

I was able to attend 5 of the DCWT events this year around the world. It was very interesting to see how much Snowflake has grown this world tour compared to the one back in 2019.  There is a ton of improvements and new features within the Snowflake Data Cloud happening.  It is hard to keep up!  Many of these announcements really do add improvements to the Data to Value business.  

Let’s get to the Snowday Summary and the plethora of Snowflake feature announcements.  Key improvements related to improvements in Data to Value that I’m most excited about are:

  • Snowpark for Python in GA
  • Private Data Listings – Massive improvement in the speed of data collaboration.
  • Snowflake Kafka Connector and Dynamic Tables.  Snowpipe Streaming.
  • Streamlit integration.

*All of these features add significant Data to Value improvements for organizations.

Snowflake Snowday Summary

*TOP announcement – whoop whoop – SNOWPARK FOR PYTHON ! (General Availability – GA)
I think this was the announcement all the python data people were looking forward to (including me). Now Snowpark for Python enables each and every Snowflake customer to build and deploy Python-based applications, pipelines, and machine learning models directly in Snowflake.  In addition to Snowpark for Python being Generally Available to all snowflake editions, these other python related announcements were made:
  • Snowpark Python UDFs for unstructured data (PRIVATE PREVIEW)
  • Python Worksheets – Now the improved Snowsight worksheet has support for python and you do not need an additonal development environment. This does make it easier to get started with Snowpark for Python development. (PRIVATE PREVIEW)

ONE PRODUCT.  ONE PLATFORM.

This is Snowflake’s major push to make it easier and easier for customers to use Snowflake’s platform for all or most of their Data Cloud needs.  This is why they now have taken on Hybrid Tables – Unistore (OLTP Workloads) as well as Snowpark.  They are growing the core Snowflake platform to handle AI/ML workloads as well os Online Transaction Processing (OLTP) workloads.  This massively increases Snowflake’s Total Addressable Market (TAM).

***This is also the main reason they purchased Streamlit earlier this year.  They are moving to integrate Streamlit Data Application Frontend and Backend and also take on the Data Applications use cases.  So Snowflake if investing a ton to go from primarily a Data Store to a Data Platform where you can create Frontend and Backend Data applications.  (as well as web/data applications that need OLTP millisecond inserts or AI/ML workloads)

Also, Snowflake just keeps improving the core Snowgrid Platform as follows:

Cross Cloud Snowgrid
Cross Cloud Snowgrid

Replication Improvements and Snowgrid Updates:

These are overall amazing Cross-Cloud Snowgrid improvements and features around the platform, performance, and replication.  If you are new to Snowflake, we answer What is Snowgrid here.

  • Cross-Cloud Business Continuity – Streams & Tasks Replication (PUBLIC PREVIEW) – This is very cool as well.  I need to test it but in theory this will provide … Seamless pipeline failover which is really aweome.  This takes replication beyond just accounts, databases, policies, and metadata.
  • Cross-Cloud Business Continuity – Replication GUI.  (PRIVATE PREVIEW).  Now you will be able to more easily manage replication and failover from a single user interface for global replication.  It looks very cool.  You can easily setup, manage, and failover an account.
  • Cross-Cloud Collaboration – Listing Discovery Controls (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
  • Cross-Cloud Governance – Tag-Based Masking (GA)
  • Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW).
  • Replication Groups – Looking forward to the latest on this as well.  These can be used for sharing and simple database replication in all editions

***All the above is available on all editions EXCEPT: 

  • YOU NEED ENTERPRISE OR HIGHER for Failover/Failback (including Failover Groups)
  • YOU NEED BUSINESS CRITICAL OR HIGHER for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

  • Query Acceleration (public preview).
  • Search Optimization Enhancements (public preview).
  • Join eliminations (GA).
  • Top results queries (GA).
  • Cost Optimizations: Account usage details (private preview).
  • History views (in development).
  • Programmatic query metrics (public preview).

***Available on all editions EXCEPT:  YOU NEED ENTERPRISE OR HIGHER for both Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m super excited about this announcement around Private Listings.  Many of you know that one of my favorite features of Snowflake is the Data Sharing which I have been writing about for over 4 years.  [My latest take is the Future of Data Collaboration] This is such a huge game-changer for Data Professionals.  This announcement is that now customers can more easily use listing for PRIVATE DATA SHARING scenarios.  It makes the fulfillment much easier as well for different regions.  (even 1-2 years ago we had to write replication commands) – I’ll write up more details about how this makes Data Sharing and Collaboration even easier.   I was delighted to see the presenters using the Data to Value concepts when presenting this.

I loved the way Snowflake used some of my Data to Value concepts around this Announcement including..  the benefit of:  “Time to value is significantly reduced for the consuming party”.  Even better, this functionality is available now for ALL SNOWFLAKE EDITIONS.

Private Listings
Private Listings

More and More Announcements on Snowday.

Snowflake has tons AND tons of improvements happening.  Other significant announcements on Snowday were:

Snowflake Data Governance IMPROVEMENTS

All of these features allow you to better protect and govern your data natively within Snowflake.
  • Tag-based Masking (GA) – This allows you to automatically assign a designated policy to sensitive columns using tags. Pretty nice.  (Generally Available)
  • Search Optimization will now have support for Tables with Masking and Row Access Policies (PRIVATE PREVIEW)
  • FedRAMP High for AWS Government (authorization in process)
***Available ONLY on ENTERPRISE+  OR HIGHER

Building ON Snowflake

New announcements related to:
  • Streamlit integration (PRIVATE PREVIEW in January 2023 – supposedly already oversubscribed?) – This is exciting to see. I cannot wait until the Private Preview.
  • Snowpark Optimization Warehouses (PUBLIC PREVIEW). This was a great move on Snowflake’s part to support what AI/ML Snowpark customers really needed. Great to see it get rolled out. This allows customers access to execute HIGHER MEMORY warehouses which can deal with ML/AI training scale better. Snowpark code can be executed on both warehouse types.

***Available for all Snowflake Editions

Finally – Streaming and Dynamic Tables ANNOUNCEMENTS:

  • Snowpipe Streaming – (PUBLIC PREVIEW SOON) –
  • Snowflake Kafka Connector – (PUBLIC PREVIEW SOON)
  • Snowflake Dynamic Tables – formerly Materialized Tables (PRIVATE PREVIEW) – Check out my fellow data superhero – Dan Galvin’s coverage here:  https://medium.com/snowflake/%EF%B8%8F-snowflake-in-a-nutshell-the-snowpipe-streaming-api-dynamic-tables-ae33567b42e8
***Available for all Snowflake Editions
Overall I”m pretty excited on where this is going.  These enhancements improve Streaming data integration so much more especially with Kafka.  Now you can as a Snowflake customer Ingest real-time data streams and transform data with low-latency.  When fully implemented then this will enable more cost-effective and performance-effective solutions around Data Lakes.

If you didn’t get enough Snowday and want to watch the recording then here is the link below:
https://www.snowflake.com/snowday/agenda/

We will be covering more of these updates from Snowday and the Snowflake BUILD event this week in more depth with the Snowflake Solutions Community.  Let us know if we missed anything or what you are excited about form Snowday in the comments!

Data to Value – Part 2

Data to Value Trends.  PART 2.  TRENDS #2-4.  (NEXT WEEK WE WILL RELEASE THE FINAL 3 trends we are highlighting)

Welcome to our Snowflake Solutions Community readers who have read Part 1 of this Data to Value 3 part series.  For those of you who have not read part 1 and want to fast forward…. We are making a fundamental point that we data professionals and data users of all types need to be focused NOT just on the creation, collection, and transformation of data.  We need to make a cognizant effort to focus and measure WHAT is the TRUE VALUE that each set of data creates?  Also, we need to measure, how fast we can get to that value if it provides any real business advantages.  There is an argument to also alter the value of the data that is time dependent since it loses value sometimes the older it is.

Here are the trends we are seeing related to the improvement of Data to Value.  Some of my favorites that are revolutionizing how data moves rapidly with more QUALITY to value for HUMANS and their ORGANIZATIONS:

Trend #1 – covered last week –>  Data to Value – Non-stop push for faster speed.  

Trend #2 – Data Sharing.  More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy” in-place data in near real time.  Data Sharing is a massive competitive advantage if set up and used appropriately.  You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake.  This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

Trend #3 – Creating Data with the End in Mind.  When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are  realizing there are advantages to capturing data in formats which are ready for immediate processing.  If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data to value cycles creating competitive advantages with certain data streams and sets.

Trend #4 – Automated Data Applications.  I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated together.  Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low cost price point due to massive volumes of customers. 

 Details for these next 3 are coming next week 🙂

Trend #5 – Full Automated Data Copying Tools.  I have watched the growth of Fivetran and Stitch since 20-18.  It has been amazing.  Now I see the growth of Hightouch and Census as well now which is also incredibly amazing.  

Trend #6 – Coming next week

Trend #7 – Coming next week

*What data to value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Snowflake’s Announcements related to Data to Value

IF YOU READ MY ARTICLE LAST WEEK.  These are currently exactly the same as last week.  I”m waiting for some of my readers to see if you have any other Snowflake Summit Announcements that I missed that are real Data to Value features as well!

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web based OLTP type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right its a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, they also make the development of data apps and the combination of OLTP/OLAP applications much less costly and more achievable for “all” types of companies.  They could remove massive friction that exists with having to have massive high end full stack development.  Streamlit really is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications though data applications?).  Its really another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Issaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is really really new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  If you read my initial Data to Value Article than these Snowflake items around Data to Value are the same as the first article.  Do you have any others that were announced at Snowflake Summit 2022?  I hope you found this 2nd article around Data to Value useful for thinking about your data initiatives.   Again, focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

Continue onto Data to Value Part 3

or go back to Part 1 – Data to Value

Data to Value

Data to Value – Part 1.  I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data.  The “data concept” space has been exploding with an increase of many different concepts and ideas.  There are so many new data “this” and data “that” tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data.  Data to Value.

The main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, or individual value.  All the other technical details and jargon between the creation and collection of the data to the value realization is important but for many users it has become overly complex especially many of the “latest concepts”.

For a short moment, let’s let go of all the consulting and technical data terms that are often becoming overused and often mis-used like Data Warehouse, Data Lake, Data Mesh, Data Observability, Data THIS and Data THAT.  Currently I’m even seeing that data experts and practitioners will have different views around the latest concepts depending on where their data education began and with the types of technologies they used.

Data to Value is what really matters

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  The intersection of cloud, data, automation, and ai/ml is having massive impacts on our society.

Data to Value Trends

Back in 2018, I had the opportunity to consult with some very advanced and mature data engineering solutions.  A few of them were actively moving with Kafka/Confluent towards true “event-driven data processing”.  It was a massive shift from the traditional batch processing used throughout 98% of implementations I had worked on previously.  The concept of using non-stop streams of data from different parts of the organizations delivered through Kafka topics I thought to be pretty awesome.  At the same time it was some pretty advanced concepts and paradigm shifts at that time for all but very advanced data engineering teams.  Here are the Data to Value Trends that I think you need to be aware of:

 

Trend #1 – Non-stop push for faster speed of Data to Value.  Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations especially around improved value chains and concepts such as supply chains.  Businesses and organizations continuously look for any advantage they can get.  I kinda of hate linking to McKinsey for backup but here goes.  Their characteristic #2 for the data-driven enterprise of 2025.  “Data is processed and delivered in real time”

 

Trend #2 – Data Sharing.  Coming next week – Part 2.

Trend #3 – Coming next week – Part 2.

Trend #4 – Coming next week – Part 2.

Trend #5 – Full Automated Data Copying Tools.  The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth at automated data copy pipelines going the other way like Hightouch.  At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.  Coming in 2 weeks – Part 3

Trend #6 – Coming in 2 weeks – Part 3

Trend #7 – Coming in 2 weeks – Part 3

*What data to value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Snowflake’s Announcements related to Data to Value

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web based OLTP type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right its a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, they also make the development of data apps and the combination of OLTP/OLAP applications much less costly and more achievable for “all” types of companies.  They could remove massive friction that exists with having to have massive high end full stack development.  Streamlit really is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications though data applications?).  Its really another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Issaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is really really new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  I truly believe these forces have been changing our world everywhere and will continue to do so for many years.  Data to Value for me is a really key concept that helps me prioritize what provides value from our data related investments and work.

Continue to part 2 and 3 of this series:

Data to Value – Part 2

Data to Value – Part 3

I hope you found this useful for thinking about your data initiatives.   Focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

What is a Snowflake Data Superhero?

What is a Snowflake Data Superhero?  Currently a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc. etc.  Finally, Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community.  Snowflake Data Superheroes get some decent DSH benefits as well (see below)

The Snowflake Data Superhero Program (Before Fall 2021)

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years.  I don’t even think there was exact criteria to be in it.  Since I was a long time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business) I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021.  Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am – 9am – Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program.  Especially for many of us in the Snowflake Advocate Old Guard.  (There are probably around 40 of us I’d say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires.  Benoit did too but as I’ve stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture.  As an engineer you have to give them some respect.)

The Snowflake Data Superhero Program (2022)

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts!  They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in-person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

They mention that they look for the following key attributes:

  • You must overall be a Snowflake expert
  • They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.  I think they even took into account for me the Snowflake Essentials book I wrote.)
  • They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
  • They also want people who support other community members and provide feedback on the Snowflake product.
  • They want overall energetic and positive people

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above.  This past year, since the program was new I also think it came down too that only certain people applied.  (I think next year it will be less exclusesive since the number of Snowflake experts is really growing from my view.  Back in 2018, there honestly was a handful of us.  I would say less than 100 worldwide.  Now there are most likely 200++ true Snowflake Data Cloud Experts outside of Snowflake Employees.  Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert.  The only way that i’m doing it (or trying too) is to employee many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions.). I also would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography.  For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the Snowflake Data Superhero 48…. there are a few that just stand out as producing an insane amount of free community content.  I’m going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

What benefits do you get when you become a Snowflake Data Superhero?

Snowflake Data Superhero BENEFITS:

In 2022, they also provided all of these benefits:

  • A ticket to the Snowflake Summit – I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake corp decisions that are not customer or partner focused, this was Snowflake Corporation actually doing something awesome and really right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake’s success (no stock, no salary).  While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
  • Snowflake Swag that is different (well, it was for awhile, now others are buying the “kicks” or sneakers)
  • Early education on new Snowflake Features
  • Early access to new Snowflake Features (Private Preview)
  • Some limited opportunities to speak at events.  (Let’s face it, the bulk of speaking opportunities these days goes in this order:  Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
  • VIP access to events (we had our own Data Superhero area within Snowflake Summit)
  • Actual Product Feedback sessions with the Snowflake Product Managers

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits.  Also, I do not think many of the DSHs have received their Data Superhero card.  (this was one of those benefits provided to like maybe 10 or more of the DSHs back in 2019 or so.  Basically anyone who was chosen to speak at Snowflake BUILD I believe is where some of it started.  I’m not 100% sure.)

The Snowflake Data Superhero Program (2023)

How do I apply to be a Snowflake Data Superhero?
Here you go:  [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

Snowflake’s Data Superhero Program Evolution

I will add some more content around this as I review how the 2023 program is going to work.  I will say I have been suprisingly pleased with the DSH Program overall this year in 2022.  It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

Snowflake’s Data Superhero Program Internal Team

I also want to give a shout out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program.  These individuals and more have been wonderful to work with this year:

  • Howard Lio
  • Leith Darawsheh
  • Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

Other Snowflake Data Superhero Questions:

Here was the full list from Feb 2021.

Who are the Snowflake Data Superheroes?

https://medium.com/snowflake/introducing-the-2022-data-superheroes-ec78319fd000

Summary

I kept getting all of these questions about, hey – what is a Snowflake Data Hero?  What is a Snowflake Data Superhero?  How do I become a Snowflake Data Superhero?  What is the criteria for becoming one?

So this article is my attempt to answer all of your Snowflake Data Superhero related questions in one place.  (from an actual Snowflake Data Superhero – 3+ years in a row).  Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Shortest Snowflake Summit 2022 Recap from a Snowflake Data Superhero

If you missed the Snowflake SUMMIT or any part of Snowflake Summit Opening Keynote. Here are the most key feature announcements and recap[in “brief” but “useful” detail]

KEY FEATURE ANNOUNCEMENTS — EXECUTIVE SUMMARY. [mostly in a chronological order of when they were announced. My top ~20. The number of announcements this week was overwhelming!]

Cost Governance:

#1. New Resource Groups concept announced where you can combine all sorts of snowflake data objects to monitor their resource usage. [this is huge since Resource Monitors were pretty primitive]

#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets coming into Private Preview in the next few weeks]

#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Replication Improvements on SnowGrid:

#4. Account Level Object Replication (Previously, Snowflake allowed data replication but not other account type objects. Now, all objects which are not just data can supposedly now can be replicated as well. Users)

#5. Pipeline Replication and Pipeline Failover. Stages and Pipes now can be replicated as well. [Kleinerman stated this is coming soon to Preview. I’m assuming Private Preview?] — DR people will love this!

Data Management and Governance Improvements:

#6. The combination of tags and policies. You can now do  — [Private Preview now and will go into public preview very soon]

Expanding External Table Support and Native ICEBERG Tables:

#7. External Table Support for Apache Iceberg is coming shortly. Remember though that External tables are ONLY read only and have other limitations so see what Snowflake did in #9 below. [pretty amazing]

#8. EXPANDING Snowflake to handle on-premise data with Storage Vendor Partners so far of Dell Technologies and Pure Storage [their integration will be in private preview in the next few weeks.]

#9. Supporting ICEBERG TABLES with FULL STANDARD TABLE support in Snowflake so these tables will support replication, time-travel, etc. etc. [very huge]. This enables so much more ease of use within a Data Lake conceptual deployment. EXPERT IN THIS AREA: Polita Paulus

Improved Streaming Data Pipeline Support:

#10. New Streaming Data Pipelines. Main innovation is the capability to create a concept of MATERIALIZED TABLES. Now you can ingest streaming data as row sets. [very huge]. EXPERT IN THIS AREA: Tyler Akidau

  • Funny — I did a presentation in Snowflake Summit 2019 on Snowflake’s Kafka connector. Now that is like ancient history. 

Application Development Disruption with Streamlit and Native Apps:

#11. Low code data application development via Streamlit. The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. Its still very early but this is super interesting.

#12. Native Application Framework. I have been working with this for about 3 months and I think its a game-changer. It allows all of us data people to create Data Apps and share them on a marketplace and monetize them as well. It really starts to position Snowflake and its new name (UGH! 3rd name change — 2019=Data Exchange, 2020=Data Marketplace, 2022=

Expanded SnowPark and Python Support:

#13. Python Support in the Snowflake Data Cloud. More importantly, this is a MAJOR MOVE to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for ALL workloads including Machine Learning. This has been an ongoing move by Snowflake to make it much much easier to run data scientist type workloads within Snowflake itself.

#14. Snowflake Python Worksheets. This is really combined with the above announcement and enables data scientists who are used to Jupyter notebooks to more easily work in a fully integrated environment in Snowflake.

New Workloads. Cybersecurity and OLTP! boom!

#15. CYBERSECURITY. This was announced awhile back but I wanted to include it here to be complete since it was emphasized again.

#16. UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much much larger part of data and application workloads by extending its capabilities BEYOND OLAP [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a massive move and positioning Snowflake as a single integrated data cloud for all data and all workloads.

Additional Improvements:

#17. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#18. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#19. Large Memory Instances. [not much more to say. they did this to handle more data science workloads but it shows Snowflake’s continued focus around customers when they need something else.]

#20. ̶D̶a̶t̶a̶ Marketplace Improvements. The Marketplace, one of my favorite things about Snowflake. They mostly announced incremental changes

Final Note: I hope you find this article useful and please let me know in the comments if you feel I missed anything really important.

I attempted to make it as short as possible while still providing enough detail so that you could understand that Snowflake Summit 2022 contained many significant announcements and moves forward by the company.

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

  1. Snowflake is positioning itself now way way beyond a cloud database or data warehouse. It now is defining itself as a full stack business solution environment capable of creating business applications
  2. Snowflake is emphasizing it is not just data but that it can handle “ALL WORKLOADS” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams data pipelines from kafka, etc. and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

Snowflake at a very high level wants to:

  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development

Want more recap beyond JUST THE FEATURES?

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Frank Slootman Recap: 

MINUTE: ~2 to ~15 in the video

Snowflake related Growth Stats Summary:

*Employee Growth: 

2019:  938 Employees

2022 at Summit:  3992 Employees

*Customer Growth:

2019:  948. Customers

2022 at Summit:  5944 Customers

*Total Revenue Growth:

2019:  96M

2022 at Summit:  1.2B

 

Large emphasis on MISSION PLAN and INDUSTRY/VERTICAL Alignment.

 

MINUTE: ~15 to ~53 – Frank Slootman and Benoit

53 to 57:45 – Christian Intros.

Frank introduces the pillars of Snowflake INNOVATION  and then Benoit and Christian delve into these 7 Pillars in more depth.

Let’s go through the 7 PILLARS OF SNOWFLAKE INNOVATIONS!

ALL DATA – Snowflake is emphasizing they can handle not only Structured Data and Semi-Structured but also Unstructured Data of ANY SCALE.  Benoit even said companies can scale out to 100s of Petabytes.

  1. ALL WORKLOADS – There is a massive push by Snowflake to provide an integrated “all workload” platform. They define this as all types of data, all types of workloads now (emphasizing now it can handle all ML/AI type workloads via SnowPark and most ). [My take:  one of Snowflake’s original architecture separation of compute and storage still is what makes it so so powerful.]
  2. GLOBAL – An emphasis on that Snowflake based on SnowGrid is a fully Global Data Cloud Platform. As of today, Snowflake is deployed over 30 cloud regions on the three main cloud providers. Snowflake works to deliver a unified global experience with full replication and failover to multiple regions based on its unique architecture of SnowGrid.
  3. SELF-MANAGED – Snowflake still is focusing a TON on continuing to make Snowflake SIMPLE and easy to use.
  4. PROGRAMMABLE – Snowflake now can be programmed not only with SQL, Javascript, Java, Scala but also Python and preferred libraries. This is where STREAMLIT fits in.
  5. MARKETPLACE – Snowflake emphasizes it continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares.).  Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. GOVERNED – Frank’s story from 2019 keynote…someone grabbed him and said…You didn’t talk about GOVERNANCE [so Frank and everyone talked a ton about it this time!] – Snowflake and Frank state that there is a continuous heavy focus on Data Security and Governance.

OTHER KEY PARTS OF THE KEYNOTE VIDEO:

[ fyi – if you didn’t access it already the FULL Snowflake Summit 2022 Opening Keynote is here:

https://www.snowflake.com/summit/agenda?agendaPath=session/849836 ]

MINUTE: ~57:45 to 67 (1:07) – Linda Appsley – GEICO testimonial on Snowflake.

MINUTE: Goldman Executive presentation.