Snowflake Snowday – Data to Value Superhero Summary

Snowflake Snowday — Data to Value and Superhero Summary

Snowflake Snowday is Snowflake’s semi-annual product announcement. This year it was held on 2022–11–07, the same day as the end of the Snowflake Data Cloud World Tour (DCWT), a live event in San Francisco.

I was able to attend 5 of the DCWT events this year around the world. It was very interesting to see how much Snowflake has grown on this world tour compared to the one back in 2019.  A ton of improvements and new features within the Snowflake Data Cloud are happening.  It is hard to keep up!  Many of these announcements do add improvements to the Data to Value business.  

Let’s get to the Snowday Summary and the plethora of Snowflake feature announcements.  Key improvements related to improvements in Data to Value that I’m most excited about are:

  • Snowpark for Python in GA
  • Private Data Listings – Massive improvement in the speed of data collaboration.
  • Snowflake Kafka Connector and Dynamic Tables.  Snowpipe Streaming.
  • Streamlit integration.

All of these features add significant Data to Value improvements for organizations.

Snowflake Snowday Summary

TOP announcement! – whoop whoop – SNOWPARK FOR PYTHON! (General Availability – GA)
I think this was the announcement all the Python data people were looking forward to (including me). Now Snowpark for Python enables every Snowflake customer to build and deploy Python-based applications, pipelines, and machine-learning models directly in Snowflake.  In addition to Snowpark for Python being Generally Available to all Snowflake editions, these other Python-related announcements were made:
  • Snowpark Python UDFs for unstructured data (PRIVATE PREVIEW)
  • Python Worksheets – Now the improved Snowsight worksheet has support for python and you do not need an additional development environment. This does make it easier to get started with Snowpark for Python development. (PRIVATE PREVIEW)

ONE PRODUCT. ONE PLATFORM.

This is Snowflake’s major push to make it easier and easier for customers to use Snowflake’s platform for all or most of their Data Cloud needs.  This is why they now have taken on Hybrid Tables – Unistore (OLTP Workloads) as well as Snowpark.  They are growing the core Snowflake platform to handle AI/ML workloads as well as Online Transaction Processing (OLTP) workloads.  This massively increases Snowflake’s Total Addressable Market (TAM).

***This is also the main reason they purchased Streamlit earlier this year. They are moving to integrate Streamlit Data Application Frontend and Backend and also take on the Data Applications use cases. So Snowflake is investing a ton to go from primarily a Data Store to a Data Platform where you can create Frontend and Backend Data applications (as well as web/data applications that need OLTP millisecond inserts or AI/ML workloads).

Also, Snowflake just keeps improving the core Snowgrid Platform as follows:

Cross Cloud Snowgrid
Cross-Cloud Snowgrid

Replication Improvements and Snowgrid Updates:

These are overall amazing Cross-Cloud Snowgrid improvements and features around the platform, performance, and replication.  If you are new to Snowflake, we answer What is Snowgrid here?

  • Cross-Cloud Business Continuity – Streams & Tasks Replication (PUBLIC PREVIEW) – This is very cool as well.  I need to test it but in theory, this will provide … Seamless pipeline failover, which is awesome.  This takes replication beyond just accounts, databases, policies, and metadata.
  • Cross-Cloud Business Continuity – Replication GUI.  (PRIVATE PREVIEW).  Now you will be able to more easily manage replication and failover from a single user interface for global replication.  It looks very cool.  You can easily set up, manage, and failover an account.
  • Cross-Cloud Collaboration – Listing Discovery Controls (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW).
  • Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
  • Cross-Cloud Governance – Tag-Based Masking (GA)
  • Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW).
  • Replication Groups – Looking forward to the latest on this as well.  These can be used for sharing and simple database replication in all editions

***All the above is available on all editions EXCEPT: 

  • YOU NEED ENTERPRISE OR HIGHER for Failover/Failback (including Failover Groups)
  • YOU NEED BUSINESS CRITICAL OR HIGHER for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

  • Query Acceleration (public preview).
  • Search Optimization Enhancements (public preview).
  • Join eliminations (GA).
  • Top results queries (GA).
  • Cost Optimizations: Account usage details (private preview).
  • History views (in development).
  • Programmatic query metrics (public preview).

***Available on all editions EXCEPT:  YOU NEED ENTERPRISE OR HIGHER for both Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m super excited about this announcement about Private Listings.  Many of you know that one of my favorite features of Snowflake is the Data Sharing which I have been writing about for over 4 years.  [My latest take is the Future of Data Collaboration] This is such a huge game-changer for Data Professionals.  This announcement is that now customers can more easily use the listing for PRIVATE DATA SHARING scenarios.  It makes fulfillment much easier as well for different regions.  (even 1-2 years ago we had to write replication commands) – I’ll write up more details about how this makes Data Sharing and Collaboration even easier.   I was delighted to see the presenters using the Data to Value concepts when presenting this.

I loved the way Snowflake used some of my Data to Value concepts around this Announcement including the benefit of:  “Time to value is significantly reduced for the consuming party”.  Even better, this functionality is available now for ALL SNOWFLAKE EDITIONS.

Private Listings
Private Listings

More and More Announcements on Snowday.

Snowflake has tons AND tons of improvements happening.  Other significant announcements on Snowday were:

Snowflake Data Governance IMPROVEMENTS

All of these features allow you to better protect and govern your data natively within Snowflake.
  • Tag-based Masking (GA) – This allows you to automatically assign a designated policy to sensitive columns using tags. Pretty nice.  (Generally Available)
  • Search Optimization will now have support for Tables with Masking and Row Access Policies (PRIVATE PREVIEW)
  • FedRAMP High for AWS Government (authorization in the process)
***Available ONLY on ENTERPRISE+ OR HIGHER

Building ON Snowflake

New announcements related to:
  • Streamlit integration (PRIVATE PREVIEW in January 2023 – supposedly already oversubscribed?) – This is exciting to see. I cannot wait until the Private Preview.
  • Snowpark Optimization Warehouses (PUBLIC PREVIEW). This was a great move on Snowflake’s part to support what AI/ML Snowpark customers needed. Great to see it get rolled out. This allows customers access to execute HIGHER MEMORY warehouses that can better deal with ML/AI training scale. Snowpark code can be executed on both warehouse types.

***Available for all Snowflake Editions

Finally – Streaming and Dynamic Tables ANNOUNCEMENTS:

  • Snowpipe Streaming – (PUBLIC PREVIEW SOON) –
  • Snowflake Kafka Connector – (PUBLIC PREVIEW SOON)
  • Snowflake Dynamic Tables – formerly Materialized Tables (PRIVATE PREVIEW) – Check out my fellow data superhero – Dan Galvin’s coverage here:  https://medium.com/snowflake/%EF%B8%8F-snowflake-in-a-nutshell-the-snowpipe-streaming-api-dynamic-tables-ae33567b42e8
***Available for all Snowflake Editions
Overall I”m pretty excited about where this is going. These enhancements improve Streaming data integration so much more especially with Kafka. Now, you can as a Snowflake customer get real-time data streams and transform data with low latency.  When fully implemented then this will enable more cost-effective and performance-effective solutions around Data Lakes.

If you didn’t get enough Snowday and want to watch the recording then here is the link below:
https://www.snowflake.com/snowday/agenda/

We will be covering more of these updates from Snowday and the Snowflake BUILD event this week in more depth with the Snowflake Solutions Community.  Let us know if we missed anything or what you are excited about from Snowday in the comments!

Data to Value – Part 3

Data to Value helps prioritize data-related investments

This is the 3rd article in my 3 part series around Data to Value.  The key takeaway from this series is that we always need to understand the value of our data.  We also need to measure the speed of how fast we can go from data to business value.  C-Level execs and others focused on strategic data initiatives need to utilize Data to Value metrics.  Then we can understand the true value that is derived from our data creation, collection, extraction, transformation, loading, and analytics. Which allows us to invest better in data initiatives for our organizations and ourselves.  Finally, data can only produce true value if it is accurate and of known quality.

If you want to view my 7 Data to Value trends, I summarize them in more detail below. If you want to check out my initial Data to Value Trends, 1 to 4 are in Data to Value – Part 1 and Data to Value – Part 2 articles.

Here are the Data to Value Trends that I think you need to be aware of (there are a few others though as well!):

Trend #1 – Non-stop push for faster speed of Data to Value. 

Within our non-stop dominantly capitalist world, faster is better! Data to Value Speed advantages for organizations especially around improved value chains can create massive business advantages.

Trend #2 – Data Sharing.  See Part 2

Trend #3 – Creating Data with the End in Mind.  See Part 2

Trend #4 – Automated Data Applications.  See Part 2

Trend #5 – Fully Automated Data Direct Copy Tools. 

Trend #6 – Full Automation of Data Pipelines and more integrated ML and Data Pipelines. 

With the introduction of a fully automated data object and pipeline service at Coalesce, we saw for the first time that data professionals improve Data to Value through fully automated data objects and pipelines. Some of our customers are referring to parts of Coalesce as a Terraform-like product for data engineering. What I see is a massive removal of data engineering friction similarto what Fivetran and Hightouch did but at a separate area of the data processing stack.  We have become an early partner with Coalesce because we think it is similar to how we viewed Snowflake at the beginning of 2018. We view Coalesce as just making Snowflake even more amazing to use.

Trend #7 – The Data Mesh Concept(s), Data Observability, etc. concepts. 

Love these concepts or hate them, they are taking hold within the overall data professionals’ brain trust. Zhamak Dehghani (previously at Thoughtworks) and ThoughtWorks from 2019 until now have succeeded in communicating to the market the concept of a Data Mesh.  Whereas, Barr Moses from Monte Carlo, has been beating the drum very hard on the concept of Data Observability.   I’m highlighting these data concepts as trends that are aligned with improving Data to Value speed, quality, and accessibility.  There are many more data concepts besides these two.  Time will reveal which of these will gain mind and market share and which will go by the wayside.

Some other things that we should keep in mind are:

– Growth of Fivetran and now Hightouch.

The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth with automated data copy pipelines going the other way; they are focusing on the Reverse ETL (Reverse Extraction Transformation and Load) like our partner Hightouch.  At our IT Strategists consulting firm, we became a partner with Stitch, Fivetran, and Matillion back in 2018.  At Snowflake’s Partner Summit back in 2018 I sat next to Jake Stein – one of the founders of Stitch on the bus from San Francisco to the event in Sonoma and we quickly became good friends. (Jake is an excellent entrepreneur and is now focused on a new startup Common Paper – a structured contracts platform – after selling Stitch to Talend)  Then I also met George Frazier from Fivetran at the event and mentioned how he was killing it with his post comparing all the cloud databases back in 2018 [there was no other content like that back then].

– Resistance to “ease of use” and “cost reductions” is futile.

Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database).  As I tested out Stitch and Fivetran though, I knew that resistance was futile.  The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked.  There was no way to stop the data market from embracing these easier-to-use data pipeline automation tools.  What was even more compelling is you can set up automated extract and load jobs within minutes or hours most of the time. This is UNLIKE any of the previous ETL tools we have been using for decades which were mostly software installations.  These installations took capacity planning, procurement, and all sorts of organizational business friction to EVEN get started at all.  With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work.  [in certain situations, it helps to have data engineers and architects involved.]   Overall though, it is just a simple concept connecting DESTINATIONS and CONNECTORS to eating Fivetran, DESTINATIONS are databases or data stores.  CONNECTORS are sources of data (Zendesk, Salesforce, or one of the hundreds of other connectors in Fivetran).  Fivetran and Hightouch are excellent examples of data service/tool trends that truly improve the speed of Data to Value.

Also, a MAJOR MAJOR trend that has been happening for a quite while “trying” to push the needle forward with data to value has been the growth of automated integrated Machine Learning pipelines with data.  This is what Data Robot, Dataiku, H2O, Sagemaker, and tons and tons of others are attempting to do.  It still seems very very early stage and not any single vendor with large mindshare or adoption yet. Overall the space is fragmented right now and it’s hard to tell which of these tools and vendors will thrive and survive.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide. Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  I truly believe these forces have been changing our world everywhere and will continue to do so for many years. 

Data to Value is a key concept that helps us prioritize how to invest in our data-related initiatives.

I hope you found this useful for thinking about how you should decide on data-related investments and initiatives.  Focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization! Did I have many value trends?  Hit me up in the comments or directly if you have additional trends.

Good Luck to you all!

Data to Value – Part 2

Welcome to our Part 2 Data to Value Trends article (Next week we will release the final 3 trends we are highlighting).

Welcome to our Snowflake Solutions Community readers who have read Part 1 of this Data to Value 3-part series. For those of you who have not read part 1 and want to fast forward… We are making a fundamental point that data professionals and data users of all types need to be focused NOT just on creating, collecting, and transforming data.  We need to make a cognizant effort to focus and measure WHAT is the TRUE VALUE that each set of data creates.  Also, we need to measure, how fast we can get to that value if it provides any real business advantages.  There is an argument to also alter the value of the data that is time-dependent since it loses value sometimes the older it is.

Here are the trends we are seeing related to the improvement of Data to Value.

Trend #1 – Data to Value – Non-stop push for faster speed.  (This was covered in the previous articles)

Trend #2 – Data Sharing.  More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy,” in-place data in near real-time.  Data Sharing is a massive competitive advantage if set up and used appropriately.  You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake. This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

Trend #3 – Creating Data with the End in Mind.  When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are realizing there are advantages to capturing data in formats that are ready for immediate processing.  If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data-to-value cycles creating competitive advantages with certain data streams and sets.

Trend #4 – Automated Data Applications.  I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated.  Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low-cost price point due to massive volumes of customers. 

 Details for these next 3 are coming next week 🙂

Trend #5 – Full Automated Data Copying Tools. I have watched the growth of Fivetran and Stitch since 2018. It has been amazing. Now I see the growth of Hightouch and Census as well which is also incredibly amazing.  

Trend #6 – Coming next week

Trend #7 – Coming next week

 

Snowflake’s Announcements related to Data to Value

These are currently the same as the ones I discussed in last week’s articles. I’m waiting for some of my readers to see if you have any other Snowflake Summit Announcements that I missed that are real Data to Value features as well!

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web-based OLTP-type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right it’s a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, but also that the development of data apps and the combination of OLTP/OLAP applications are much less costly and more achievable for “all” types of companies.  They could remove the massive friction that exists when having high-end full-stack development.  Streamlit is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications through data applications?).  It’s another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Isaac from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements. This is new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Data to Value

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  If you read my initial Data to Value Article then these Snowflake items around Data to Value are the same as the first article.  Do you have any others that were announced at Snowflake Summit 2022?  I hope you found this 2nd article around Data to Value useful for thinking about your data initiatives.   Again, focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

Data to Value – Part 1 – Snowflake Solutions

Welcome to our Frank’s Future of Data three-part series. In these articles, we will cover a few tips on how to get value out of your Snowflake data.

I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data. The “data concept” space has been exploding with an increase in many different concepts and ideas. There are so many new data “this” and data “that” tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data. Data to Value.

The main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, or individual value. All the other technical details and jargon between the creation and collection of the data to the value realization is important but for many users, it has become overly complex.

For a short moment, let’s let go of all the consulting and technical data terms that are often becoming overused and often misused like Data Warehouse, Data Lake, Data Mesh, Data Observability, etc. Currently, I’m even seeing that data experts and practitioners will have different views around the latest concepts depending on where their data education began and the types of technologies they used.

Therefore, I created these articles to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide. 

Data to Value Trends

Back in 2018, I had the opportunity to consult with some very advanced and mature data engineering solutions.  A few of them were actively moving with Kafka/Confluent towards true “event-driven data processing”. It was a massive shift from the traditional batch processing used throughout 98% of implementations I had worked on previously.  The concept of using non-stop streams of data from different parts of the organizations delivered through Kafka topics I thought to be pretty awesome. At the same time, it was some pretty advanced concepts and paradigm shifts at that time for all but very advanced data engineering teams. 

Here are the Data to Value Trends that I think you need to be aware of:

Trend #1 – Non-stop push for faster speed of Data to Value.  Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations, especially around improved value chains and concepts such as supply chains.  Businesses and organizations continuously look for any advantage they can get. I kinda hate linking to McKinsey for backup but here it goes. Their number 2 characteristic for the data-driven enterprise of 2025 is “Data is processed and delivered in real-time”.

Trend #2 – Data Sharing.  Coming next week – Part 2.

Trend #3 – Coming next week – Part 2.

Trend #4 – Coming next week – Part 2.

Trend #5 – Full Automated Data Copying Tools.  The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth in automated data copy pipelines going the other way like Hightouch.  At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.  

Trend #6 – Coming in 2 weeks – Part 3

Trend #7 – Coming in 2 weeks – Part 3

*What data-to-value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Snowflake’s Announcements related to Data to Value

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  There is huge interest from customers in a concept like this where that single source of truth thing happens by having web-based OLTP-type apps operating on Snowflake with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right it’s a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above are not only that data “can” go to value faster, but they also make the development of data apps and the combination of OLTP/OLAP applications much less costly and more achievable for “all” types of companies.  They could remove massive friction that exists with having to have massive high-end full-stack development.  Streamlit is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications through data applications?).  It’s another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Isaaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  I truly believe these forces have been changing our world everywhere and will continue to do so for many years.  Data to Value for me is a key concept that helps me prioritize what provides value from our data-related investments and work.

Continue to parts 2 and 3 of this series:

Data to Value – Part 2

Data to Value – Part 3

I hope you found this useful for thinking about your data initiatives.   Focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck!

What is a Snowflake Data Superhero?

What is a Snowflake Data Superhero?  Currently a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc. etc.  Finally, Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community.  Snowflake Data Superheroes get some decent DSH benefits as well (see below)

The Snowflake Data Superhero Program (Before Fall 2021)

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years.  I don’t even think there was exact criteria to be in it.  Since I was a long time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business) I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021.  Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am – 9am – Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program.  Especially for many of us in the Snowflake Advocate Old Guard.  (There are probably around 40 of us I’d say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires.  Benoit did too but as I’ve stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture.  As an engineer you have to give them some respect.)

The Snowflake Data Superhero Program (2022)

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts!  They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in-person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

They mention that they look for the following key attributes:

  • You must overall be a Snowflake expert
  • They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.  I think they even took into account for me the Snowflake Essentials book I wrote.)
  • They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
  • They also want people who support other community members and provide feedback on the Snowflake product.
  • They want overall energetic and positive people

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above.  This past year, since the program was new I also think it came down too that only certain people applied.  (I think next year it will be less exclusesive since the number of Snowflake experts is really growing from my view.  Back in 2018, there honestly was a handful of us.  I would say less than 100 worldwide.  Now there are most likely 200++ true Snowflake Data Cloud Experts outside of Snowflake Employees.  Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert.  The only way that i’m doing it (or trying too) is to employee many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions.). I also would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography.  For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the Snowflake Data Superhero 48…. there are a few that just stand out as producing an insane amount of free community content.  I’m going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

What benefits do you get when you become a Snowflake Data Superhero?

Snowflake Data Superhero BENEFITS:

In 2022, they also provided all of these benefits:

  • A ticket to the Snowflake Summit – I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake corp decisions that are not customer or partner focused, this was Snowflake Corporation actually doing something awesome and really right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake’s success (no stock, no salary).  While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
  • Snowflake Swag that is different (well, it was for awhile, now others are buying the “kicks” or sneakers)
  • Early education on new Snowflake Features
  • Early access to new Snowflake Features (Private Preview)
  • Some limited opportunities to speak at events.  (Let’s face it, the bulk of speaking opportunities these days goes in this order:  Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
  • VIP access to events (we had our own Data Superhero area within Snowflake Summit)
  • Actual Product Feedback sessions with the Snowflake Product Managers

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits.  Also, I do not think many of the DSHs have received their Data Superhero card.  (this was one of those benefits provided to like maybe 10 or more of the DSHs back in 2019 or so.  Basically anyone who was chosen to speak at Snowflake BUILD I believe is where some of it started.  I’m not 100% sure.)

The Snowflake Data Superhero Program (2023)

How do I apply to be a Snowflake Data Superhero?
Here you go:  [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

Snowflake’s Data Superhero Program Evolution

I will add some more content around this as I review how the 2023 program is going to work.  I will say I have been suprisingly pleased with the DSH Program overall this year in 2022.  It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

Snowflake’s Data Superhero Program Internal Team

I also want to give a shout out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program.  These individuals and more have been wonderful to work with this year:

  • Howard Lio
  • Leith Darawsheh
  • Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

Other Snowflake Data Superhero Questions:

Here was the full list from Feb 2021.

Who are the Snowflake Data Superheroes?

https://medium.com/snowflake/introducing-the-2022-data-superheroes-ec78319fd000

Summary

I kept getting all of these questions about, hey – what is a Snowflake Data Hero?  What is a Snowflake Data Superhero?  How do I become a Snowflake Data Superhero?  What is the criteria for becoming one?

So this article is my attempt to answer all of your Snowflake Data Superhero related questions in one place.  (from an actual Snowflake Data Superhero – 3+ years in a row).  Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Shortest Snowflake Summit 2022 Recap from a Snowflake Data Superhero

If you missed the Snowflake SUMMIT or any part of the Snowflake Summit Opening Keynote. Here are the most key feature announcements and recap[in “brief” but “useful” detail]

KEY FEATURE ANNOUNCEMENTS — EXECUTIVE SUMMARY. [mostly in chronological order of when they were announced. My top ~20. The number of announcements this week was overwhelming!]

Cost Governance:

#1. A New Resource Groups concept announced where you can combine all sorts of snowflake data objects to monitor their resource usage. [This is huge since Resource Monitors were pretty primitive]

#2. Concept of Budgets that you can track against. [both Resource Groups and Budgets coming into Private Preview in the next few weeks]

#3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

Replication Improvements on SnowGrid:

#4. Account Level Object Replication (Previously, Snowflake allowed data replication but not other account type objects. Now, all objects which are not just data can supposedly now be replicated as well. Users)

#5. Pipeline Replication and Pipeline Failover. Stages and Pipes now can be replicated as well. [Kleinerman stated this is coming soon to Preview. I’m assuming Private Preview?] — DR people will love this!

Data Management and Governance Improvements:

#6. The combination of tags and policies. You can now do  — [Private Preview now and will go into public preview very soon]

Expanding External Table Support and Native ICEBERG Tables:

#7. External Table Support for Apache Iceberg is coming shortly. Remember though that External tables are ONLY read-only and have other limitations so see what Snowflake did in #9 below. [pretty amazing]

#8. EXPANDING Snowflake to handle on-premise data with Storage Vendor Partners so far of Dell Technologies and Pure Storage [their integration will be in private preview in the next few weeks.]

#9. Supporting ICEBERG TABLES with FULL STANDARD TABLE support in Snowflake so these tables will support replication, time travel, etc., etc. [very huge]. This enables so much more ease of use within a Data Lake conceptual deployment. EXPERT IN THIS AREA: Polita Paulus

Improved Streaming Data Pipeline Support:

#10. New Streaming Data Pipelines. The main innovation is the capability to create a concept of MATERIALIZED TABLES. Now you can ingest streaming data as row sets. [very huge]. EXPERT IN THIS AREA: Tyler Akidau

  • Funny — I did a presentation at Snowflake Summit 2019 on Snowflake’s Kafka connector. Now that is like ancient history. 

Application Development Disruption with Streamlit and Native Apps:

#11. Low code data application development via Streamlit. The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. Its still very early but this is super interesting.

#12. Native Application Framework. I have been working with this for about 3 months and I think its a game-changer. It allows all of us data people to create Data Apps and share them on a marketplace and monetize them as well. It really starts to position Snowflake and its new name (UGH! 3rd name change — 2019=Data Exchange, 2020=Data Marketplace, 2022=

Expanded SnowPark and Python Support:

#13. Python Support in the Snowflake Data Cloud. More importantly, this is a MAJOR MOVE to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for ALL workloads including Machine Learning. This has been an ongoing move by Snowflake to make it much much easier to run data scientist type workloads within Snowflake itself.

#14. Snowflake Python Worksheets. This is really combined with the above announcement and enables data scientists who are used to Jupyter notebooks to more easily work in a fully integrated environment in Snowflake.

New Workloads. Cybersecurity and OLTP! boom!

#15. CYBERSECURITY. This was announced awhile back but I wanted to include it here to be complete since it was emphasized again.

#16. UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much much larger part of data and application workloads by extending its capabilities BEYOND OLAP [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a massive move and positioning Snowflake as a single integrated data cloud for all data and all workloads.

Additional Improvements:

#17. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#18. Snowflake Overall Data Cloud Performance Improvements. This is cool but given all the other “more transformative” announcements I’m just bundling this together. Performance improvements included improvements on AWS related to new AWS capabilities as well as more power per credit with internal optimizations. [since Snowflake is a closed system though I think its hard for customers to see and verify this]

#19. Large Memory Instances. [not much more to say. they did this to handle more data science workloads but it shows Snowflake’s continued focus around customers when they need something else.]

#20. ̶D̶a̶t̶a̶ Marketplace Improvements. The Marketplace, one of my favorite things about Snowflake. They mostly announced incremental changes

Final Note: I hope you find this article useful and please let me know in the comments if you feel I missed anything really important.

I attempted to make it as short as possible while still providing enough detail so that you could understand that Snowflake Summit 2022 contained many significant announcements and moves forward by the company.

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

  1. Snowflake is positioning itself now way way beyond a cloud database or data warehouse. It now is defining itself as a full stack business solution environment capable of creating business applications
  2. Snowflake is emphasizing it is not just data but that it can handle “ALL WORKLOADS” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams data pipelines from kafka, etc. and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

Snowflake at a very high level wants to:

  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development

Want more recap beyond JUST THE FEATURES?

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Frank Slootman Recap: 

MINUTE: ~2 to ~15 in the video

Snowflake related Growth Stats Summary:

*Employee Growth: 

2019:  938 Employees

2022 at Summit:  3992 Employees

*Customer Growth:

2019:  948. Customers

2022 at Summit:  5944 Customers

*Total Revenue Growth:

2019:  96M

2022 at Summit:  1.2B

 

Large emphasis on MISSION PLAN and INDUSTRY/VERTICAL Alignment.

 

MINUTE: ~15 to ~53 – Frank Slootman and Benoit

53 to 57:45 – Christian Intros.

Frank introduces the pillars of Snowflake INNOVATION  and then Benoit and Christian delve into these 7 Pillars in more depth.

Let’s go through the 7 PILLARS OF SNOWFLAKE INNOVATIONS!

ALL DATA – Snowflake is emphasizing they can handle not only Structured Data and Semi-Structured but also Unstructured Data of ANY SCALE.  Benoit even said companies can scale out to 100s of Petabytes.

  1. ALL WORKLOADS – There is a massive push by Snowflake to provide an integrated “all workload” platform. They define this as all types of data, all types of workloads now (emphasizing now it can handle all ML/AI type workloads via SnowPark and most ). [My take:  one of Snowflake’s original architecture separation of compute and storage still is what makes it so so powerful.]
  2. GLOBAL – An emphasis on that Snowflake based on SnowGrid is a fully Global Data Cloud Platform. As of today, Snowflake is deployed over 30 cloud regions on the three main cloud providers. Snowflake works to deliver a unified global experience with full replication and failover to multiple regions based on its unique architecture of SnowGrid.
  3. SELF-MANAGED – Snowflake still is focusing a TON on continuing to make Snowflake SIMPLE and easy to use.
  4. PROGRAMMABLE – Snowflake now can be programmed not only with SQL, Javascript, Java, Scala but also Python and preferred libraries. This is where STREAMLIT fits in.
  5. MARKETPLACE – Snowflake emphasizes it continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares.).  Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. GOVERNED – Frank’s story from 2019 keynote…someone grabbed him and said…You didn’t talk about GOVERNANCE [so Frank and everyone talked a ton about it this time!] – Snowflake and Frank state that there is a continuous heavy focus on Data Security and Governance.

OTHER KEY PARTS OF THE KEYNOTE VIDEO:

[ fyi – if you didn’t access it already the FULL Snowflake Summit 2022 Opening Keynote is here:

https://www.snowflake.com/summit/agenda?agendaPath=session/849836 ]

MINUTE: ~57:45 to 67 (1:07) – Linda Appsley – GEICO testimonial on Snowflake.

MINUTE: Goldman Executive presentation.

 

Automated Modern Data Stack

The Automated Modern Data Stack.  We believe it was our partner Fivetran that used a TON of “marketing dollars” to come up with this entire “data concept” of the “Modern Data Stack”.  [If anyone feels differently, then comment or directly reach out to me].  Once this term / concept was taking off in the data mindsphere, it seems like every data vendor under the sun with any type of solution tries to market that they are part of this concept.

Since 2018 when we came across this pay as you go, separated compute from storage simple cloud database of Snowflake, we immediately recognized this was a huge step forward for data professionals and what we have termed as “Data to Value”.  Snowflake’s architecture COMPLETELY leap frogged all the competitors at that time.  As we became one of the top consulting partners (we sold that line of business back in 2018 – We FOCUS now only on Snowflake and Automated Modern Data Stack Thought Leadership, Education, and Saas based services like Snoptimizer™), we became more and more cognizant of what was truly differentiating Snowflake within the entire set of various data processing use cases within OLAP.  We view it as the massive removal of friction around the workloads.  This is where we see all of our OTHER main data related partners doing the same thing as well.  That is how they became part of our Automated Modern Data Stack.

Automation within the “Automated” Modern Data Stack is what really matters!

[okay.  Automation and the continuous data security, data governance, and checking/assurance of DATA QUALITY.]   While we make strong positions for the automation of data, the reality is data is USELESS and even detrimental without some form of guaranteed or “checked” quality.  Without further adieu, here is our 2022 ITS Automated Modern Data Stack.

ITS Snowflake Solutions – Automated Modern Data Stack

Base Layer 0 – Snowflake

Hey, the website does have the domain Snowflake Solutions and it is our ITS Snowflake Solutions Community where we try to answer any question related to Snowflake is a completely transparent and validated approach.  Unlike tons of other Q&A areas around Snowflake, we work to have Snowflake Experts and Snowflake Data Superheroes validate ALL answers here.  I see so much outdated answers in other locations and it annoyed the hell out of all of us here.  🙂

Layer 1– Extract – Load.  Fivetran

ITS has been a long time partner with Fivetran, Matillion, and Stitch (now Talend).  We realized though for most of our work which ALWAYS involves the top tools related to AUTOMATION, we still see Fivetran as the clear winner here within our data s.  Love or hate Fivetran and their consumption based pricing (:)).  They are growing fast and from our view they dominate the “AUTOMATED” EL space.

Layer 2– Transform – Document – Be Column Aware.  Coalesce.

When we came across the full demonstration of Coalesce, we were blown away.  We view coalesce automation tool as one of the largest game changers of 2022.  We have not seen a transformation product like this which is so easy to use and does not have the complexity that all the other ETL, ELT, and T tools have that make them much more difficult to work with.  Our only reservation around Coalesce is that it ONLY works on Snowflake and the reality is that there are many customers that are more diversified in their Layer 0 of a cloud database or data cloud….  If you are Snowflake Centric then get a demo of Coalesce from someone as soon as possible.  Signup here if you want to get on our calendar for a Data Superhero overview.  

Layer 3– Reverse Extract-Load (or I guess some people call it Reverse-ETL as well).  Hightouch.  

We have reviewed this Reverse EL/ELT space for quite some time.  It has really grown a ton over the last couple years.  We found Hightouch to be the easiest to use and partner with.  

Layer 4– Data Analytics WITHIN THE DATA CLOUD.  Sigma Computing

We have followed Sigma Computing’s rise in the analytics space for the last few years.  We originally met with Rob Woolen back in 2018 I believe.  At first the tool was pretty limited but its coming on strong and the reason why we see it as a MASSIVE game changer eventually is because it goes where Tableau, Microstrategy, Domo, blah blah. blah, etc. never were designed from the ground up for.  It operates as a calculation engine on top of Snowflake and the BEST PART if done corectly is that Sigma NEVER … EVER EVER creates extracts.  While this was necessary in pre-history cloud times…. It is one of our MAJOR MAJOR data analytics anti-patterns.  Copying the same data even once can create havoc throughout your organization.   Listen, I’ve been and I”ve done that but we shouldn’t be doing it anymore.

Other layers we do not have partners with:

Layer x – Data Governance.

At this time we have not chosen a vendor for this part.  There is tons of fragmentation of competitors.  At some of our work we definitely see Alation and Collibra as the two dominant players but we have not partnered with either yet.  There are free

Layer y – Data Quality.  Same here.

Layer z – Data Platform Optimization.

We are biased – We recommend our Saas Service Snoptimizer™ for this when using Snowflake as Layer 0.

Are we missing any of the most necessary automated data layers?  Is there any data automation technology you think we need to add?  We purposely have kept this list very short but definitely comment below or reach out directly to us.

***Also, we do use sqlDBM as one of our key partners as well.  We just do not have this heavily automated.

Summary

This page is the current main AUTOMATED MODERN DATA STACK we use 95% of the time which we recommend to our customers who are predominantly moving to Snowflake.  Also, one thing that DEFINITELY concerns us is how Snowflake has really become the “GREAT” Co-opetitor.  We see it moving every year to take on more and more of the workloads adjacent to its HUGE Snowflake Data Cloud.  What assurances will Fivetran, Hightouch, Sigma, etc. ever have that Snowflake will not use its HUGE war-chest to enter their areas more and more.   [okay.  with the big man Michael 

I hope you found this useful for thinking about what your truly Automated Modern Data Stack should be.

Snowflake’s Financial Services Data Summit Recap

Snowflake hosted an excellent virtual event this week focused on Snowflake’s Data Cloud solutions for Financial Services.  What we enjoyed about this Industry Vertical Snowflake “Summit” was the combination of business and technology content.  When we launched our ITS solutions business long ago, our mission was Business/Technology Focused Solutions.  We believed heavily in bringing the best of a combination of business and technology solutions where business teams and technology teams worked together.  We wanted to avoid the pitfalls that I had seen with solutions just business-focused without technology considerations or collaboration.  We also wanted to avoid what still happens too often,  technology solutions with no or limited business value.   [The entire reason that led to the product/market fit concepts]

If you were too busy and missed it then we have a list of the Financial Services Data Summit session recordings below.  Check them out and let us know your thoughts in the comments below.  We hope this is useful and enjoy!

Financial Services Data Summit Highlights and Take Aways

[Please continue to give us feedback on this Financial Services Summit and what were your favorite sessions.  We learn so much from each other]

  • Major emphasis on the Snowflake Financial Services Data Cloud and its partners such as BlackRock, Invesco, State Street, Fiserv, etc.
  • Financial Services Data Provider Presentations.  We were excited about the Data Provider Presentations from Acxiom,
  • It is all about the customer.  This was a common theme from Snowflake customers/partners like Blackrock, State Street, and the Data Providers especially.  They emphasized the partnership with Snowflake and how this is enabling new data collaborations that were not possible before.

Financial Services Data Summit By the Numbers:

Hey, we love data and stats so we like to break down the sessions and presenters’ data.  Here it is!

*Sessions – 17
*Tracks – 4 [ Customer Centricity – Risk and Data Governance – Digitalize Operations – Platform Monetization]
*Speakers – 44
*Speakers By Type

Presenter Company Type Count Breakdown %
Snowflake 16 37.21%
Customers 9 20.93%
Partners – Consulting 6 13.95%
Partners – Data Providers 7 16.28%
Partners – Products 5 11.63%

Data Workload Financial Services Recap

Data Warehousing, Engineering, Data Lakes

We still think this Data Warehousing is the workload that works best with Snowflake and what it was originally designed for.  We see many businesses within Financial Services moving to the Snowflake Data Cloud for their Data Warehouse workloads.  Many of the Financial Services companies who presented at the summit also are moving to a combination of Data Lakes and Data Warehousing.  The presentations overall though were focused on a mixture of financial services and processes combined with data technologies to improve the Financial Services business.  There is a huge transformation happening here and Snowflake’s powerful data cloud is accelerating how Finacial Services companies transform.  There was an interesting video as well that Capital One did that isn’t in the sessions.  They emphasized how they were the first to jump into delivering Financial Services on the Cloud.

Data Science and Data Applications

Many of the presentations at the Financial Data Summit were related to building data applications on top of Snowflake.  This was one of our favorites:
Building on Snowflake: Driving Platform Monetization with the Data Cloud.
Recording Link

Data Marketplace and Data Exchanges

From our perspective, this Financial Services Summit by far was about the Financial Services Data Cloud.  The main highlight of Blackrock’s Aladdin Data Cloud and Q&A around that was one of the main presentations.  There was also a very large focus on Financial Services Data Providers of Acxiom, Intelligence, FactSet, etc., and on the Data Cloud Data and Services they provide.

Check out our in-depth coverage of all the Financial Services Data Shares [coming soon – this week]

Conclusion:

The Snowflake Financial Services Data Summit was an excellent first Vertical Industry Summit with major Financial Services Customers and Partners such as BlackRock, State Street, Invesco, Fiserv, Square, NYSE, Wester Union, Acxiom, etc.  Our favorites [from a practical learning perspective] were:

1. Fiserv CTO Marc did a great job in this presentation demonstrating Fiserv Applications/Tools on top of Snowflake.
Recording Link

2.  Building on Snowflake: Driving Platform Monetization with the Data Cloud.
Recording Link

There were many many other great presentations though as well from Providers and Snowflake partners like Alation, AWS, etc.

Snowflake Data Provider Presentations:

Acxiom –Recording Link

Intelligent & FactSet – Recording Link

S&P Global – Recording Link

Full List – Snowflake Financial Services Sessions Recording Links:

Session NameSpeakersSpeaker Company NameVideo Recording
Welcome and Executive PanelFrank Slootman,Shelly Swanback,Stacey Cunningham,Rick Underwood,Elise BergeronSnowflake,Western Union,NYSE,Snowflake,SnowflakeLink
The Financial Services Data Cloud, RevealedAustin Burkett,Joanna Johnston,Matthew J. Glickman,Randy Wigginton,Rinesh PatelRefinitiv,Snowflake,Snowflake,Square,SnowflakeLink
Snowflake Customer Q&AChristian Kleinerman,Elise Bergeron,Sudhir Nair,Matthew J. GlickmanSnowflake,Snowflake,BlackRock,SnowflakeLink
Building on Snowflake: Driving Platform Monetization with the Data CloudBob Morrison,Marius Ndini,Matt HillSnowflake,Snowflake,SnowflakeLink
Leverage the Insurance Third Party Data ExchangeNaresh Shetty,Sampath ParthasarathyCognizant,CognizantLink
Manage Risk & Data Governance with the Snowflake PlatformAlex Gutow,Brad Romano,Rich MurnaneSnowflake,Snowflake,SnowflakeLink
Delivering Personalized Experiences with Customer 360-Degree ViewsAndy Sanderson,Marcin Kulakowski,Shireen IraniSnowflake,Snowflake,SnowflakeLink
Building Scalable, Data-Driven Engagement for Financial Services CompaniesOlaf Tennhardt,Tom BoisvertDeloitte,DeloitteLink
Building a Connected Data Ecosystem for Accelerated Time-to-Insight with State Street Alpha Data PlatformPaul Zajac,Spiros GiannarosInvesco,State StreetLink
Driving Operational Efficiencies and Innovation with S&P Global Data and SnowflakeJohn Schirripa,Young Cha,Justine IversonS&P Global,Blackstone,S&P GlobalLink
Speeding Data Cloud Adoption and Business Transformation at BlackstoneMatt Turner,Jai SubrahmanyamAlation,The Blackstone GroupLink
Get in the Flow of Actionable DataMarc RindFiservLink
Financial Services Deep Dive with Acxiom Data ExpertsPatrick Duggan,Doug HurstAcxiom,AcxiomLink
Making Financial Data More Accessible in the CloudJulie Hutchinson,Roland Anderson,Vincent SaulysAWS,TP ICAP Data & Analytics,Amazon Web Services (AWS)Link
The Transformative Power of Snowflake for Financial Regulatory Reporting and ESG Investment ManagementMichael Barnes,Brandon SutcliffeEY,EYLink
Accelerating Scale & Distribution with the FactSet-Snowflake PartnershipPeter Dorsey,Jake HawkesworthFactSet,EntelligentLink
Becoming an Intelligent Organization Through Data ScienceGreg Willis,Suresh VadakathDataiku,DataikuLink

Financial Services Data Cloud Announced

Besides the summit event, Snowflake announced the Snowflake Financial Services Data Cloud as well.  We view this as really just a subset of the overall Snowflake Data Cloud vision and definition.   We assume Snowflake will continue to roll out industry vertical conceptual concepts of Data Clouds.  This is super interesting and transformative at many levels.  It is a massive movement to more centralized and shared data versus the historical data silos that have developed within companies.

This is the statement from the press release:
“Financial Services Data Cloud, which unites Snowflake’s industry-tailored platform governance capabilities, Snowflake- and partner-delivered solutions, and industry-critical datasets, to help Financial Services organizations revolutionize how they use data to drive business growth and deliver better customer experiences.”

At a high level, this is pretty awesome “theoretically” and aligns with a lot of the thought leadership work I’m doing around moving from a paradigm of on-premise closed data systems and silos to an ever-evolving worldwide concept of integrated data.

What is the Snowflake Data Cloud?

What is the Snowflake Data Cloud is sort of a loaded question for long-time Snowflake Professionals like a large chunk of our Snowflake Solutions Community.  From a Snowflake Veteran user standpoint, [let’s assume that is anyone deeply working on Snowflake since the Fall of 2019 or before] this is a massive re-branding.   Many of us view the Snowflake Data Cloud as a way to strategically evolve Snowflake to take on more database processing workloads.  Before this, most Snowflake customers mainly of Snowflake as an Analytical RDBMS that was for a Data Warehousing use case.

My take is that Frank and many other strategic thinkers needed to evolve Snowflake to a larger strategic analytical data solution before Snowflake was ready for its IPO in the fall of 2020.  Therefore, while many of the pieces were already there, the terminology of the Snowflake Data Cloud was born on June 2, 2020, at the virtual summit and this announcement.  https://www.snowflake.com/news/snowflake-unveils-the-data-cloud-so-organizations-can-connect-collaborate-and-deliver-value-with-data/

Let’s step forward to 2021.

What the Snowflake Data Cloud is now in 2021.

The Snowflake Data Cloud overall is a vision of Snowflake taking on the following six major data workloads and use cases:

This is the statement they make back in the press release:
“unveiled the Data Cloud – an ecosystem where thousands of Snowflake customers, partners, data providers, and data service providers can break down data silos, derive insights, and deliver value from rapidly growing data sets in secure, governed, compliant, and seamless ways. Snowflake also announced new product features for the Snowflake Cloud Data platform – the technology that unites this data ecosystem and powers the Data Cloud.”

At a high level, this is pretty awesome “theoretically” and aligns with a lot of the thought leadership work I’m doing around moving from a paradigm of on-premise closed data systems and silos to an ever-evolving worldwide concept of integrated data.

Let’s keep this article focused on the tangible aspects of the Snowflake Cloud Data platform or whether is it the Snowflake Data Cloud Platform.  I am using them interchangeably.  Let’s cover these six major data use cases that the Snowflake Data Cloud Platform is positioning itself to handle.

The Snowflake Data Cloud Six Major Workloads:

Data Workload 1 – Data Warehousing

I think initially Benoit/Thierry mainly focused on this workload or use case first and foremost.  I still think this is Snowflake’s best use case and it’s still the best cloud data warehouse in my opinion in 2021.  [and believe it or not, I’m trying to be non-biased]. As someone, who hands-on built many data warehouses on many platforms, Snowflake does this well.

Data Workload 2 – Data Engineering

Data Engineering is a good workload for Snowflake.

Data Workload 3 – Data Lake

I’m going to hold off on our analysis of Snowflake as a Data Lake.  I have been supporting what I dubbed as SnowLake for a long time – probably back to the end of 2018 or so.  That being said we are in a midst of doing a head-to-head study of possible data lake solutions out there.

Data Workload 4 – Data Science

Data Science is a new workload area for Snowflake.  They have released SnowPark as their Go To Market solution for Data Science Workloads but it’s still pretty brand new and hasn’t fully ported over Python language into it yet so it’s still pretty early.

Data Workload 5 – Data Applications

This is an interesting workload for Snowflake.  They have been doing parts of this workload for a long time.  Data Applications though to me is such a wide field that I still think a lot of the subsets of this would need to be done on an OLTP-type database.

Data Workload 6 – Data Exchange

If you couldn’t tell before, this is probably my favorite upcoming workload and use case.  This is another workload where I think Snowflake shines and currently is in a significant lead against all competitors.

At Snowflake Solutions, we have been involved in collaborating with Snowflake on Data Sharing, Data Exchange, and Data Marketplace solutions since the beginning of 2018.   We have been tracking the growth of the Snowflake Marketplace itself since July 2020, the month after it launched.

Besides an overall Snowflake Data Cloud Platform, I think Snowflake customers often equate the Data Cloud concept with the overall Data Sharing, Data Exchanges, and Data Marketplace put together.

Conclusion:

The Snowflake Data Cloud is gaining traction in the marketplace.  We have first-hand seen the rapid growth of the Data Marketplace and Data Sharing aspects of the cloud.  It still has a long way to go to own and dominate these six major data workloads.  Overall, the data cloud itself is an interesting strategy and has some compelling value if you are part of the Snowflake camp and partnerships.  Probably the most significant market confirmation of this vision is the announcement of the Aladdin Data Cloud.  [fyi.  I’m the small guy that spend $ and lots of effort and time with meetings, education, etc…… helping on this from 2018-2021 but saw no upside from our original snowflake education and evangelism.  Oh well, you win some and you get your partner hosed on some.  That is life.

We hope you enjoyed the article and it helped you understand our perspective on the Snowflake Data Cloud and what it is.

What is Snoptimizer?

Hmmm.  What is Snoptimizer™?

Snoptimizer™ is the first automated Cost, Performance, and Security Optimizer for Snowflake Accounts.  It is by far the easiest and fastest way to optimize your Snowflake Account quickly.  This unique service can optimize your Snowflake Account within minutes.  It is a no-brainer to test out.  Signup here for a fast and easy 10 day free trial.

From a high level, Snoptimizer™ can be quickly setup by your ACCOUNT ADMINISTRATOR role on Snowflake within minutes.  What have you got to lose?

Okay.  Snoptimizer sounds cool.  What does Snoptimizer really do?  
Also, why did you build Snoptimizer?  The reason we built Snoptimizer™ because we saw a tremendous need in the marketplace.   All too often we were brought in by existing Snowflake customers for Snowflake Health checks and 98% of the time the customers’ Account was not optimized as much as we could optimize it.  Unfortunately, most of the time it was actually “highly” unoptimized and the customer was not using Snowflake as effectively as it could in one or many areas.  To solve this need we built Snoptimizer™.

Snoptimizer was built by longtime Snowflake Data Heroes [Only 50 in the world], consultants, and Snowflake Powered-By product builders.  Our Snoptimizer team is composed of some of the deepest experts in Snowflake optimization and migration.  We have lived, breathed, and eaten Snowflake for several years and studied every area of Snowflake in depth in order to provide you an unparalleled optimization service.

Problem Statement that Snoptimizer Solves:  Snowflake is an amazingly scalable and easy to use consumption based Data As a Service.  Simple put, its an amazing Cloud RDBMS that scales like nothing I had seen before.  That being said, The Snowflake Data Cloud offering is changing and growing with new functionality and cost related services constantly.  Also, the Snowflake database and overall Snowflake Data Cloud conceptually is relatively new for many new Snowflake administrators and users.  While the RDBMS basics are relatively easy to use compared to past solutions and other analytical data solutions, it is definitely NOT easy to fully optimize a Snowflake Account to get the most efficiency and cost savings possible.  This requires in-depth understanding of hundreds of different views and objects.  You need to deeply understand how warehouses, resource monitors, query_history, materialized views, search optimization, snowpipe, load_history, etc. etc.  operate and their core level.  So quickly and easily Snoptimize your Snowflake Account now!

A few Customer Optimization Problems we have seen. 

*Poorly configured consumption.  Too often we see unneeded consumption credits wasted by incorrectly setup warehouses.  Remember, cost-consumption based services are AWESOME until they are incorrectly used.  We covered long ago how Snowflake costs and cost risks you are exposed to can add up fast in this Cost Risk article.   can   well optimized than we would see performance issues or security issues with the account.  Therefore we scoured every areas of Snowflake meta data views and created the most advanced optimizations around cost, performance, and security beyond anything documented or available anywhere else.

*Incorrect Storage based settings and/or architectures.   Again, we come into Snowflake Health Checks and find some interesting settings with 10, 20, 30, 90 day time travel settings across objects that do not require this level of time-travel.  Or we see lift and shift migrations that retain drop and recreate architectures which make no sense in a Snowflake Account.

*Warehouse with inefficient settings.   This is one of the most common issues we fix immediately that often saves our customers hundreds of dollars daily.

*Accounts with HUGE Cost Risk Exposure.   As we stated in previous blog posts here at ITS Snowflake Solutions, Snowflake brings awesome scale but if not correctly “Snoptimized” it also has extensive cost risks by default with its consumption-based pricing especially around compute and larger warehouses.  This is the Snowflake Cost Risks we wrote about previously.  Extreme Risks with a 6XL on AWS.

Snowflake Cost Risk Use Case 6XL – 1 cluster:
*Cost per hour @ $3/credit = 3 * 512 = $1536
*okay – so right now – 5XL and 6XL are in private preview ONLY on AWS so then… 
What’s your cost exposure with a 4XL on ONLY 1 cluster?
Snowflake Cost Risk Use Case 4XL – 1 cluster:
*Cost EXPOSURE per hour @ $3/credit = 3 * 256 = $768 PER HOUR
*Within 7 hours you go through $5,376k
*Within 14 hours your ACCOUNT spend is:  $10,752

Again, with great power and with Ease of Use also comes great responsibility.  While we have loved Snowflake and its ease of use, we also have done so many Snowflake health checks where Snoptimizer™ picks up and fixes Account problems where customers re-sized warehouses EASILY [yes, you can do it via the interface or via command line in seconds – it is BOTH awesome and dangerous to untrained users WHO HAVE create/alter warehouse GRANTED ROLES] but unnecessarily.  Unfortunately, even for Snowflake data professionals and administrative experts it is ALL too easy to change Warehouse settings for a simple test or even for their QA team and miss re-setting them or auto suspending them.  A good analogy we use is UNOPTIMIZED Snowflake ACCOUNTS often allow users to drive over non-existent or POORLY setup Snowflake Cost Guardrails.

What does Snoptimizer™ do?

Snoptimizer runs regularly and scours your Snowflake Operations Account Meta Data (Over 40 Views) continuously looking for Snowflake storage and compute anti-patterns and inefficiencies related to cost, performance, and security.  It is the only continuous service watching out for you and your Snowflake Account to keep it in tip top shape!

Let’s dive deeper:

Snoptimizer™ – Snowflake Cost Optimization:

The Snoptimizer Cost Optimization service regularly evaluates ongoing queries and settings related to Warehouses and Resource Monitors.  It immediately removes incorrectly setup Accounts specifications.   service.

The Snoptimizer service repeatedly works for you and regularly looks at your Snowflake Account(s) and finds any areas that can be optimized for costs.  The service can be set for automatic optimization settings or settings that must be approved.  Snoptimizer is your best friend for optimizing costs on your Snowflake Account.

Again, Snowflake RDBMS and DDL, DML basics are easy to use BUT misconfiguration of warehouses and compute optimization services are ALSO EASY to do.  Snoptimizer removes this unnecessary inefficient and unoptimized use of Snowflake compute and storage.

Snoptimizer™ – Snowflake Performance Optimization:

The Snoptimizer Team not only scours cost optimizations but at the same time we look at all of your query_history and related views and we can pick up both warehouses that are over provisioned and under provisioned.  We are the only service we know of that automates this for you and provides suggestions on Warehouse changes or even that you should use other Snowflake cost based services that provide awesome benefits such as:

  • Auto Clustering
  • Materialized Views
  • Search Optimization

Snoptimizer™ – Snowflake Security Optimization:

Again, Snoptimizer is one of your best Snowflake automated administrative friends for security issues.  It repeatedly checks for security issues related to your Snowflake Account that put your account and data at risk.  Since security is specialized often at a company culture level we provide optimizations and best practices that you can implement to avoid account and data breaches.  Snoptimizer Security Optimization itself runs many many security checks repeatedly looking for incorrect security configurations or exposures.

If all of that wasn’t enough for you… let’s highlight some Snoptimizer Core Features:

Snoptimizer Core Features:

  • Analyzes Snowflake Compute Warehouses for inefficient settings
  • Limits Compute Resources “Risk Cost Exposure” Immediately
  • Reviews Previous Queries and Consumption for performance and efficiency
  • Provides regular reporting on Snowflake Usage
  • Creates effective Resource Monitors per warehouse
  • Provides Optimization Recommendations and Automations depending on your setup
  • Incorporates every single documented and “some undocumented” Snowflake Cost Optimization Best Practice and more
  • How does Snoptimizer Help You?

It very quickly and automatically goes in and runs automated Snowflake security, cost, and performance optimization checks and best practices immediately against our Snoptimizer customer accounts.  It removes all the headaches and worries of security and cost exposure across your entire Snowflake Account.  It removes you from mistakenly falling into the Snowflake Cost Anti-patterns.

At a high-level it just makes your Snowflake Cost, Performance, and Security Administration easier and automated for you.  No hassle optimization in a few hours. Get Snoptimized today!

Conclusion:

The Snowflake Data Cloud continues to grow and while its easy to use its much harder to optimize for cost, security, and performance.  Snoptimizer makes cost, performance, and security optimization easy for you at low cost and saves you the headaches of cost overruns or security exposures.

Snowflake’s Differentiating Features

What are the features of Snowflake that differentiate it from all its competitors?  I started this list in 2018 and it continues to evolve. Sure, I am a Snowflake Data Superhero and longtime Snowflake Advocate. I do try to be objective though.  Also, I have had a long long career of partnering with new technologies during my 19 years of running a successful consulting firm.  I have to state that most vendors and technologies do NOT impress me at all.  While I partnered with Microsoft (we were a gold partner for many years) and many others, the reality is that most of their technology was not a game-changer like an internet or Netscape (the first browser). They typically were solid technology solutions that helped our clients.  When I discovered Snowflake at the beginning of 2018 when looking to build a custom CDP for a Fortune 50 company I realized this technology and this architecture was going to be a game changer for the data processing industry, especially within BIG DATA and ANALYTICS.  

Snowflake’s Differentiating Features (2018 or before)

  1. Concurrency and Workload Separation [enabled by the Separation of Compute from Storage.]  [huge! for the first time, you could completely separate workloads and not have the traditional concurrency challenges of table locking or ETL jobs COMPETING with Reporting or Data Science jobs.]
  2. Pay-as-you-go pricing (also, named Consumption-based pricing) – This enabled for the very first time that startups and medium-sized businesses could get true cloud BIG DATA scale at an amazingly affordable price.  This never happened before this.
  3. Time-Travel.  (Based on write-ahead Micro-partitions.)
  4. Zero-Copy Cloning.
  5. True Cloud Scale.  DYNAMIC (the way it should be!) In and Out Scaling with Clusters.
  6. True Cloud Scale.  Up and Down.  [code or manual still at this point.  Switching between XS to 4XL warehouse t-shirt sizes]
  7. Data Sharing (this may be my favorite feature.  Data Sharing is transforming industries)
  8. Snowpipe.  The ability to handle the ingestion of streaming data in near real-time.
  9. Data Security.  Encrypted Data from end-to-end.  While some other vendors had some of this Snowflake made security first in the cloud.
  10. Semi-Structured Data ease of use.  Snowflake has been the easiest way we have been able to have JSON and other
  11. Lower Database Administration.   Amazingly, no database vendor didn’t automate the collection of database/query statistics and automated indexing/pruning before.  Huge STEP forward.   [I DO NOT agree with Near-Zero Administration – this is not true especially as Snowflake transformed to a data cloud and added on tons and tons of additional features which have some additional administration requirements]

Snowflake’s Differentiating Features (2019-2021)

  1. Data Exchange and then Data Marketplace.
  2. Cloud Provider Agnostic. Move to support Azure as well as GCP in addition to AWS.
  3. Data Clean Room V1. Capability to use Secure User Defined Functions within Data Shares.
  4. Data Governance Capabilities.
  5. Integrated Data Science with Snowpark. [still needs work!]
  6. Unstructured data. Amazingly now

Snowflake’s Differentiating Features (2022)

*I’m going to wait until December 2022 to finalize this list.  There were some amazing announcements.

One item though that I”m finding awesome is  Access to the SNOWFLAKE.ORGANIZATION_USAGE Schema (I think it’s still in preview but this makes Organizational reporting so much easier.  Previously we build tools that would log into each account and go to the SNOWFLAKE.ACCOUNT_USAGE schema views within each count and pulls it back to a centralized location.  Sure it worked but it was a pain.

To be fair and not a complete Snowflake Advocate, Snowflake needs a reality check right now.  Snowflake Summit 2022 was an amazing amount of announcements.  (Even though a focused business person could argue… what is Snowflake now?  A Data Cloud?  A data application development environment?  A Data Science and ML tool?  My heart goes out to the Account Executives.  They have to focus first when they do capacity deals on the true value of what Snowflake provides today!)   Also, the true reality is many of the significant announcements remind me of my Microsoft Gold Partner days…. lots of Coming Soon…. but not that soon.  Many of these feature announcements will not be truly available until 2023.

Snowflake’s Differentiating Features (2023)

Coming next year :). you just have to wait!

Summary

Since 2018, I was getting questions from so many colleagues and customers about why is Snowflake better than the on-prem databases they were using.  Or I was getting tons of questions about why Snowflake is different than Redshift or Big Query or Synapse.  

So this article is my attempt to explain to both business users of data and data professionals (from architects to analysts) why Snowflake is different from any other technology.

GigOm

GigOm

With superior performance and the most hands-off model of ownership, Snowflake is the epitome of a data warehouse as a service. The model, cost, features, and scalability have already caused some to postpone Hadoop adoption. In its multicluster, scale-out approach, Snowflake separates compute from storage. It is fundamental to the architecture where multiple, independent compute clusters can access a shared pool of data without resource contention. Customers pay for what is used without a stairstep approach to resources and pricing.

  • The cost model is simple at terabytes per year or computing hours.
  • For primary storage, Snowflake uses Amazon’s Simple Storage Service (S3). Snowflake also uses an SSD layer for caching and temp space.
  • Snowflake customers deploy a wide array of modern BI and visualization tools, some utilizing the ODBC and JDBC connectivity.
  • Snowflake also offers a web interface.
  • Snowflake SQL includes support of objects in JSON, XML, Avro, and Parquet using a special data type that can handle flexible-schema, nested, hierarchical data in table form.
  • There are no indexes either, as zone maps are used for an abstract understanding of data in the database. SQL extensions include UNDROP and CLONE. Features include result set persistence and automatic encryption.
  • No downtime is required for anything including upgrades or cluster expansion.
  • Concurrency, a clear challenge in database scale-out, is a focus at Snowflake.  Their automatic concurrency scaling is a single logical virtual warehouse composed of multiple compute clusters split across availability zones.
  • Finally, Snowflake has a native connector for Spark built on the Spark Data Sources API.

Snowflake has jumped constraints found in databases from earlier development and honed a very promising cloud analytic database. Eminently elastic on a foundation of separation of computing and storage, Snowflake offers as close to a hands-off approach as we found. Snowflake is market-leading in what you would want for a multi-purpose cloud data warehouse analytical database.

Source – GigaOm Sector Roadmap: Cloud Analytic Databases 2017

ETL vs ELT: Data Warehouses Evolved

ETL vs ELT: Data Warehouses Evolved

For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Continue reading

Semi-Structured Data Loading & Querying in Snowflake

Semi-Structured Data Loading & Querying in Snowflake

Unstructured data has become a tremendously popular format today (be it JSON, Avro, Parquet, or XML) for a multitude of things such as IoT, mobile, and apps. Until now though, there’s been no fast and easy way to store and analyze this kind of data, but Continue reading

Query Caching in Snowflake

Query Caching in Snowflake

Have you ever experienced slow query response times while waiting for a report that’s being viewed by multiple users and/or teams within your organization simultaneously?

This is a common issue in today’s data driven world; it’s called concurrency and it’s frustrating, usually delaying productivity just when the data being requested is needed the most. Well, here’s an incredible time saver you may not have yet heard about: Continue reading

Querying Multiple Databases in Snowflake

Continue reading

Snowflake Vertical: The Software Industry

How Snowflake has helped customers in the Software Industry

In this new series of articles, we’re going to be approaching things from a vertical perspective. It helps to narrow things down by industry sometimes, especially since, as a business leader, time is always of the essence. Let’s start with a couple of case studies, beginning with the software industry: Continue reading

Snowflake Vertical: The Service Industry

How Snowflake has helped customers in the Service Industry

This, our second entry in for the series on approaching things from a vertical perspective, is an obviously gigantic industry to try and cover, so let’s just look at a couple very different use case histories, from two totally different service industry enterprises: Continue reading

Snowflake Vertical: The Product Industry

How Snowflake has helped customers in the Product Industry

Nike ~ Sportswear

Nike is a sportswear company that probably needs no introduction. Continue reading

The Data Warehouse Evolution

Data Warehousing has been around for quite a few years. The last 20 years it has made the greatest impact on businesses. With the recent exponential growth in customer and sensor data then its been challenging for traditional data warehouses and appliances because they just were not designed for that amount of data. The cloud data warehouse has evolved to solve that and enable businesses to be more data-driven.

Our big data practice has always sought to find the best solutions for our customers and we have had to evolve as the data collection and the data warehouse has evolved over the last 20 years. We have gone from RDBMS (Oracle, SQL Server, MySQL) solutions to Appliance (Teradata, Netezza, Vertica) to Hadoop (HDFS/Hive, etc.) to Cloud (Redshift, Azure, BigQuery) to finally now…. A fully Elastic Cloud Data Warehouse that was designed from the ground up to leverage the cloud’s scalability (Snowflake). Continue reading