Shortest Snowflake Summit 2023 Recap

Introduction:

Similar to last year, I wanted to create a “shortest” recap of the Snowflake Summit 2023, including the key feature announcements and innovations.  This is exactly 2 weeks after Snowflake Summit has ended and I have digested the major changes.  Throughout July and August we will follow up with our view of the massive Data to Value improvements and capabilities being made.

 

Snowflake Summit 2023 Recap from a Snowflake Data Superhero:

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

 

Top Announcements:

 

 1.Native Applications goes to Public Preview: 

I am slightly biased here because my teams have been working with Snowflake Native Apps since Feb/March 2022. We have been on the journey with Snowflake from early early Private Preview to now over the last 16 months or so.  We are super excited about the possibilities and potential of where this will go.  

 

2. Nvidia/Snowflake Partnership, Support of LLMs, and Snowpark Container Services (Private Preview):   

Nvidia and Snowflake are teaming up (because as Frank S. says… some people are trying to kill Snowflake Corp) and they will integrate Nvidia’s LLM framework into Snowflake. I’m also really looking forward to seeing how these Snowpark Container Services work.

 

3. Dynamic Tables (Public Preview):  

Many Snowflake customers including myself are really excited about this.  These allow new Data Set related key features beyond a similar concept like Materialized Views. With Dynamic Tables you can… have declarative data pipelines, dynamic SQL Support , user defined low latency freshness, automated incremental refreshes, and snapshot isolation.

 

4. Managed Iceberg Tables (Private Preview): 

“Managed iceberg tables” allows Snowflake Compute Resources to manage Iceberg data.  This really helps with easier management of iceberg format data and helps Snowflake compete for Data Lake or really large Data File type workloads. So Snowflake customers can manage their data lake catalog with Iceberg BUT still get huge value with better compute performance with Snowflake’s query engine reading the metadata that Iceberg provides.  In some ways this is a huge large file data to value play.  It enables what blob storage (S3, Azure, etc.) do best BUT then being able to utilize Snowflake’s compute means less DATA TRANSFORMATION and faster value from the data including dealing with standard data modifications like updates, deletes and inserts.

 

5. Snowpipe Streaming API (Public Preview): 

As someone that worked with and presented on the Kafka Streaming Connector back at Summit 2019 it is really great to see this advancement. Back then the connector was “ok”. It could handle certain levels of streaming workloads. 4 years later this streaming workload processing has gotten much much better.

 

Top Cost Governance and Control Changes:

As anyone who has read my blog over the past few years, I’m a huge advocate of the Snowflake pay for what you use is AWESOME but ONLY when tools like our Snoptimizer® Optimization tool is used or you really really setup all the cost guard rails correctly.  98% of accounts we help with Snoptimizer do not have all the optimizations set correctly.  Without continuous monitoring of costs (and for that matter performance and security – which we also offer unlike a lot of the other copycats).

1. Budgets (Public Preview): 

This “budget” cost control feature was actually announced back in June 2022.  We have been waiting for it for some time now.  It is good to see Snowflake finally delivering this functionality. Since we started as one of the top Snowflake Systems Integrators back in 2018 there has been ONLY Resource Monitors to have ANY control whatsoever with guardrail limit type functionality.  This has been a huge pain point for many customers for many years.  Now, with this budget feature, users can actually specify a budget and get much more granular details about their spending limits.

2. Warehouse Utilization (Private Preview): 

This is another great step forward for Snowflake customers looking to optimize their Snowflake warehouse utilization.  We already leverage meta data statistics that are available to do this within Snoptimizer® but we are limited by the level of detail we can gather. This will allow us to optimize workloads much better across Warehouses to get even higher Snowflake Cost Optimization for our customers.

 

My takeaways from Snowflake Summit 2023:

  • If you would like more content and my summaries are not enough details then you are in luck. Here are more details from my team on our top findings around Snowflake Summit 2023.
  • Snowpark Container Services allow Snowflake customers to now run any job, function or service — from 3rd party LLMs, to Hex Notebook to a C++ application to even a full database, like Pinecone, in users’ own accounts. It now supports GPUs.
  • Streamlit is getting a new faster and easier user interface to develop apps. It is an open-source Python-based framework compatible with major libraries like sci-kit-learn, PyTorch, and Pandas. It has Git integration for branching, merging, and version control.
  • Snowflake is leveraging two of its recent acquisitions — Applica and Neeva to provide a new Generative AI experience. The former acquisition has led to Document AI, an LLM that extracts contextual entities from unstructured data and queries unstructured data using natural language. The unstructured to structured data is persisted in Snowflake and vectorized. Not only can this data be queried in natural language, but it can also be used to retrain the LLM on private enterprise data. While most vendors are pursuing prompt engineering. Snowflake is following the retraining path.
  • Snowflake now provides full MLOps capabilities, including Model Registry, where models can be stored, version controlled, and deployed. They are also adding a feature store with compatibility with open-source Feast. It is also building LangChain integration.
  • Last year, Snowflake added support for Iceberg Tables. This year it brings the tables under its security, governance, and query optimizer umbrella. Iceberg table’s performance now matches the tables’ query latency in native format.
  • Snowflake is addressing the criticism of its high cost through several initiatives designed to make costs predictable and transparent. Snowflake Performance Index (SPI) — using ML functions, it analyzes query durations for stable workloads and automatically optimizes them. This has led to 15% improvement on customers’ usage costs.
  • Snowflake has invested hugely in building native data quality capabilities within its platform. Users can define quality check metrics to profile data and gather statistics on column value distributions, null values, etc. These metrics are written to time-series tables which helps build thresholds and detects anomalies from regular patterns.
  • Snowflake announced two new APIs to support the ML lifecycle:
  • ML Modeling API: The ML Modeling API includes interfaces for preprocessing data and training models. It is built on top of popular libraries like Scikit Learn and XGBoost, but seamlessly parallelizes data operations to run in a distributed manner on Snowpark. This means that data scientists can scale their modeling efforts beyond what they could fit in memory on a conventional compute instance.
  • MLOps API: The MLOps API is built to help streamline model deployments. The first release of the MLOps API includes a Model Registry to help track and version models as they are developed and promoted to production.
  • Improved Apache Iceberg integrations
  • GIT Integration: Native git integration to view, run, edit, and collaborate within Snowflake code that exists in git repos. Delivers seamless version control, CI/CD workflows, and better testing controls for pipelines, ML models, and applications.
  • Top-K Pruning Queries: Enable you to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, help reduce the need to scan across entire data sets, thereby enabling faster searches. (SELECT ..FROM ..TABLE ORDER BY ABC LIMIT 10).
  • Warehouse Utilization: A single metric that gives customers visibility into actual warehouse utilization and can show idle capacity. This will help you better estimate the capacity and size of warehouses.
  • Geospatial Features: Geometry Data Type, switch spatial system using ST_Transformation, Invalid shape detection, many new functions for Geometry and Geography
  • Dynamic Tables
  • Amazon S3-compatible Storage
  • Passing References for Tables, Views, Functions, and Queries to a Stored Procedure — Preview

 

Marketplace Capacity Drawdown Program

Anomaly Detection: Flags metric values that differ from typical expectations.

Contribution Explorer: Helps you find dimensions and values that affect the metric in surprising ways.

 

What did happen to Unistore? 

 

UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features: This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

This was from last year too – it’s great to see this move forward!  (even though..Streamlit speed is still a work in progress)

 

Application Development Disruption with Streamlit and Native Apps:

 

Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.

Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

Snowflake at a very high level (still) wants to:

Disrupt Data Analytics

Disrupt Data Collaboration

Disrupt Data Application Development

Data to Value – Part 2

Introduction:

Welcome to our part 2 Data to Value series. If you’ve read Part 1 of the Data to Value Series, you’ve learned about some of the trends happening within the data space industry as a whole.

In Part 2 of the Data to Value series, we’ll explore additional trends to consider, as well as some of Snowflake’s announcements in relation to Data to Value.

As a refresher on this series, we are making a fundamental point that data professionals and data users of all types need to be focused not just on creating, collecting, and transforming data. We need to make a cognizant effort to focus on and measure what is the true value that each set of data creates. Also, we need to measure, how fast we can get to that value if it provides any real business advantages. There is an argument to also alter the value of the data that is time-dependent since it loses value sometimes the older it is.

 

Data to Value Trends – Part 2:

 

8) – Growth of Fivetran and now Hightouch.

The growth and success of Fivetran and Stitch (now Talend) has been remarkable. There is now a significant surge in the popularity of automated data copy pipelines that work in the reverse direction, with a focus on Reverse ETL (Reverse Extraction Transformation and Load), much like our trusted partner, Hightouch. Our IT Strategists consulting firm became partners with Stitch, Fivetran, and Matillion in 2018.

At the Snowflake Partner Summit of the same year, I had the pleasure of sitting next to Jake Stein, one of the founders of Stitch, on the bus from San Francisco to Sonoma. We quickly became friends, and I was impressed by his entrepreneurial spirit. Jake has since moved on to a new startup, Common Paper, a structured contracts platform, after selling Stitch to Talend. At the same event, I also had the opportunity to meet George Frazier from Fivetran, who impressed me with his post comparing all the cloud databases back in 2018. At that time, such content was scarce.

 

9) – Resistance to “ease of use” and “cost reductions” is futile.

Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database).  As I tested out Stitch and Fivetran though, I knew that resistance was futile. The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked. There was no way to stop the data market from embracing these easier-to-use data pipeline automation tools.

What was even more compelling is you can set up automated extract and load jobs within minutes or hours most of the time. This is unlike any of the previous ETL tools we have been using for decades which were mostly software installations. These installations took capacity planning, procurement, and all sorts of organizational business friction to even get started at all. With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work. In some cases, it can be beneficial to have the expertise of data engineers and architects involved.

Overall, the concept is simple: connecting destinations and connectors to facilitate Fivetran. Destinations refer to databases or data stores. Connectors are sources of data, such as Zendesk, Salesforce, or one of the many other connectors in Fivetran. Fivetran and Hightouch are great examples of trends in data services and tools that really speed up the process of getting value from your data.

 

10) – Growth of Automated and Integrated Machine Learning Pipelines with Data.

Many companies, including Data Robot, Dataiku, H2O, and Sagemaker, are working to achieve this goal. However, this field appears to be in its early stages, with no single vendor having gained widespread adoption or mindshare. Currently, the market is fragmented, and it is difficult to predict which of these tools and vendors will succeed in the long run.

 

Snowflake’s Announcements related to Data to Value

Snowflake is making significant investments and progress in the field of data analysis, with a focus on delivering value to its clients. Their recent announcements at the Snowflake Summit this year, as detailed in this source, highlight new features that are designed to enhance the Data to Value experience.

 

Snowflake recently announced its support of Hybrid Tables and the concept of Unistore.

This move is aimed at providing Online Transaction Processing (OLTP) to its customers. There has been great interest from customers in this concept, which allows for a single source of truth through web-based OLTP-type applications operating on Snowflake with Hybrid tables.

 

Announcements about Snowflake’s Native Apps:

 

  • Integrating Streamlit into Snowflake.

If done correctly, this could be yet another game-changer in turning data into value.
Please note that these two items mentioned not only enable data to be processed more quickly, but also significantly reduce the cost and complexity of developing data apps and combining OLTP/OLAP applications. This removes many of the barriers that come with requiring expensive, full-stack development. Streamlit aims to simplify the development of data applications by removing the complexity of the front-end and middle-tier components. (After all, aren’t most applications data-driven?) It is yet another low-code data development environment.)

  • Announcement of Snowpipe streamlining.

I found this particularly fascinating, as I had collaborated with Isaaic from Snowflake before the 2019 Summit using the original Kafka to Snowflake Connector. At Snowflake Summit 2019, I also gave a presentation on the topic. It was truly amazing to witness Snowflake refactor the old Kafka connector. As a result, there were significant improvements in speed and lower latency. This is yet another major victory for streamlining data to improve value, with an anticipated 10 times lower latency. The public preview is slated for later in 2022.

  • Announcement: Snowpark for Python and Snowpark in General

Snowflake has recently introduced a new technology called Snowpark. While the verdict is still out on this new technology, it represents a major attempt by Snowflake to provide ML pipeline data with increased speed. Snowflake is looking to integrate full data event processing and machine learning processes within Snowflake itself.

 

If Snowflake can execute this correctly, it will revolutionize how we approach data value. Additionally, it reduces the costs associated with deploying data applications.

 

Conclusion:

In part 2 of the “Data to Value” series, we explored additional trends in the data industry, including the growth of automated data copy pipelines and integrated machine learning pipelines. We also discuss Snowflake’s recent announcements related to data analysis and delivering value to clients, including support for hybrid tables and native apps. The key takeaway is the importance of understanding the value of data and measuring the speed of going from data to business value.

Executives and others who prioritize strategic data initiatives should make use of Data to Value metrics. This helps us comprehend the actual value that stems from our data creation, collection, extraction, transformation, loading, and analytics. By doing so, we can make better investments in data initiatives for our organizations and ourselves. Ultimately, data can only generate genuine value if it is reliable and of confirmed quality.

Snowflake Snowday – Data to Value Superhero Summary

Snowflake Snowday  —  Summary

Snowflake’s semiannual product announcement, Snowflake Snowday, took place on November 7, 2022, the same day as the end of Snowflake’s Data Cloud World Tour (DCWT).

I attended 5 DCWT events across the globe in 2022. It was fascinating to see how much Snowflake has grown since the 2019 tour. Many improvements and new features are being added to the Snowflake Data Cloud. It’s hard to keep up! These announcements should further improve Snowflake’s ability to turn data into value.

Let’s summarize the exciting Snowflake announcements from Snowday. The features we’re most enthusiastic about that improve Data to Value are:

  • Snowflake’s Python SDK (Snowpark) is now generally available.
  • Private data sharing significantly accelerates collaborative data work.
  • The Snowflake Kafka connector, dynamic tables, and Snowpipe streaming enable real-time data integration.
  • Streamlit integration simplifies dashboard and app development.

All of these features substantially improve Data to Value for organizations.

Snowflake Snowday Summary – Top Announcements

TOP announcement! – whoop whoop – SNOWPARK FOR PYTHON! (General Availability – GA)

I believe this was the announcement all Python data scientists were anticipating (including myself). Snowpark for Python now enables every Snowflake customer to develop and deploy Python-based apps, pipelines, and machine-learning models directly in Snowflake. In addition to Snowpark for Python being Generally Available to all Snowflake editions, these other Python-related announcements were made:

  • Snowpark Python UDFs for unstructured data (Private Preview)
  • Python Worksheets – The improved Snowsight worksheet now supports Python so you don’t need an additional development environment. This simplifies getting started with Snowpark for Python development. (Private preview)

One Product. One Platform.

  • Snowflake’s major push is to make its platform increasingly easy to use for most or all of its customers’ data cloud needs.
  • Snowflake now offers Hybrid Tables for OLTP workloads and Snowpark. Snowflake is expanding its core platform to handle AI/ML and online transaction processing (OLTP) workloads. This significantly increases Snowflake’s total addressable market.
  • Snowflake acquired Streamlit earlier this year for a main reason. They aim to integrate Streamlit’s data application frontend and backend. They also want to handle data application use cases.
  • Snowflake is investing heavily to evolve from primarily a data store to a data platform for building frontend and backend data applications. This includes web/data apps needing millisecond OLTP inserts or AI/ML workloads.

Additionally, Snowflake continually improves the core Snowflake Platform in the following ways:

The Cross-Cloud Snowgrid:

https://snowflakesolutions.net/wp-content/uploads/Snowday-Cross-Cloud-Snowgrid-1024x762.png

Replication Improvements and Snowgrid Updates:

These improvements and enhancements to Snowflake, the cross-cloud data platform, significantly boost performance and replication. If you’re unfamiliar with Snowflake, we explain what Snowgrid is here.

  • Cross-Cloud Business Continuity – Stream & Task Replication (PUBLIC PREVIEW) – This enables seamless pipeline failover, which is fantastic. It takes replication beyond just accounts, databases, policies, and metadata.
  • Cross-Cloud Business Continuity – Replication GUI (PRIVATE PREVIEW). You can now more easily manage replication and failover from a single interface for global replication. It enables easy setup, management, and failover of an account.
  • Cross-Cloud Collaboration – Discovery Controls (PUBLIC PREVIEW)
  • Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW)
  • Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
  • Cross-Cloud Governance – Tag-Based Masking (GA)
  • Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW)
  • Replication Groups – Looking forward to updates on this as well. These can enable sharing and simple database replication in all editions.
  • The above are available in all editions EXCEPT:
  • Enterprise or higher needed for Failover/Failback (including Failover Groups)
  • Business Critical or higher needed for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

  • Query Acceleration (public preview): Speeds up search queries.
  • Search Optimization Enhancements (public preview): Improves search relevance and precision.
  • Join eliminations (GA): Removes unnecessary table joins.
  • Top results queries (GA): Returns the most relevant search results.
  • Cost Optimizations: Account usage details (private preview): Reduces search costs.
  • History views (in development): Provides search query history.
  • Programmatic query metrics (public preview): Offers API for search analytics. Available on all editions EXCEPT: ENTERPRISE OR HIGHER REQUIRED for Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m thrilled about Snowflake’s announcement regarding Private Listings. Many of you know that Data Sharing, which I’ve been writing about for over 4 years, is one of my favorite Snowflake features. My latest article is “The Future of Data Collaboration.” Data Sharing is a game-changer for data professionals.

Snowflake’s announcement makes private data-sharing scenarios much easier to implement. Fulfilling different regional requirements is now simpler too (even 1-2 years ago, we had to write replication commands). I’ll provide more details on how this simplifies data sharing and collaboration. I was happy to see presenters use the Data to Value concepts in their announcement.

I appreciated Snowflake incorporating some of my Data to Value concepts, like “Time to value is significantly reduced for the consuming party.” Even better, this functionality is now available for ALL SNOWFLAKE EDITIONS.

Private Listings (Get a crisper-looking visual)

https://snowflakesolutions.net/wp-content/uploads/Snowday-Listings-Cross-Cloud-Improvements-300x190.png

Snowflake Data Governance Improvements

All Snowflake features enable native data governance and protection.

  • Tag-based Masking automatically applies designated policies to sensitive columns using tags.
  • Search Optimization now supports tables with masking and row access policies.
  • FedRAMP High for AWS Government (authorization pending). *Available ONLY on ENTERPRISE+ OR HIGHER

Building on Snowflake

New announcements related to:

  • Streamlit integration (PRIVATE PREVIEW in January 2023) – This integration will be exciting. The private preview can’t come soon enough.
  • Snowpark Optimization Warehouses (PUBLIC PREVIEW) – This was a smart move by Snowflake to support AI/ML Snowpark customers’ needs. Great to see it rolled out, allowing customers access to higher memory warehouses better suited for ML/AI training scale. Snowpark code can run on both warehouse types.
  • *Available for all Snowflake Editions

Streaming and Dynamic Table Announcements:

Conclusion:

Overall, I’m thrilled with where this is headed. These enhancements greatly improve Snowflake’s streaming data integration, especially with Kafka. Now, Snowflake customers can get real-time data streams and transform data with low latency. When fully implemented, this will enable more cost-effective and high-performance data lake solutions.

If you missed Snowday and want to watch the recording, here’s the link: https://www.snowflake.com/snowday/agenda/

We’ll cover more updates from Snowday and Snowflake BUILD in depth this week in the Snowflake Solutions Community.

Data to Value – Part 1 – Snowflake Solutions

Introduction:

 

Welcome to our Frank’s Future of Data four-part series. In these articles, we will cover a few tips on how to get value out of your Snowflake data.

I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data. The “data concept” space has been exploding with an increase in many different concepts and ideas. There are so many new data “this” and data “that” tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data. Data to Value.

In layman’s terms, the main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, and/or individual value. This is the core principle that we should keep in mind when contemplating the value that data provides.

The truth is that while the technical details and jargon involved in creating and collecting data, as well as realizing its value, are important, many users find them overly complex.

For a moment, let’s set aside the technical jargon that can be overused and misused, such as Data Warehouse, Data Lake, Data Mesh, and Data Observability. I’ve noticed that data experts and practitioners often have differing views on the latest concepts. These views can be influenced by their data education background and the types of technologies they were exposed to.

Therefore, I created these articles to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide.

On Part 1 of the Data to Value series we will cover the Data to Value trends you need to be aware of.

 

Data to Value Trends:

 

In 2018, I had the opportunity to consult with some highly advanced and mature data engineering solutions. Some of these solutions were actively adopting Kafka/Confluent to achieve true “event-driven data processing”. This represented a significant departure from the traditional batch processing that had been used in 98% of the implementations I had previously encountered. I found the idea of using continuous streams of data from different parts of the organization, delivered via Kafka topics, to be quite impressive. At the same time, these concepts and paradigm shifts were quite advanced and likely only accessible to very experienced data engineering teams.

1) – Non-stop push for faster speed of Data to Value.

Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations, especially around improved value chains and concepts such as supply chains.  Businesses and organizations continuously look for any advantage they can get. I kinda hate linking to McKinsey for backup but here it goes. Their number 2 characteristic for the data-driven enterprise of 2025 is “Data is processed and delivered in real-time”.

 

2) – Data Sharing.

More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy,” in-place data in near real-time.  Data Sharing is a massive competitive advantage if set up and used appropriately. You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake. This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

 

3) – Creating Data with the End in Mind.

When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are realizing there are advantages to capturing data in formats that are ready for immediate processing.  If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data-to-value cycles creating competitive advantages with certain data streams and sets.

 

4) – Automated Data Applications.

I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated. Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low-cost price point due to massive volumes of customers.

 

5) – Full Automated Data Copying Tools.

The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth in automated data copy pipelines going the other way like Hightouch.  At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.

 

6) – Full Automation of Data Pipelines and more integrated ML and Data Pipelines.

With the introduction of a fully automated data object and pipeline service at Coalesce, we saw for the first time that data professionals improve Data to Value through fully automated data objects and pipelines. Some of our customers are referring to parts of Coalesce as a Terraform-like product for data engineering. What I see is a massive removal of data engineering friction similar to what Fivetran and Hightouch did but at a separate area of the data processing stack. We have become an early partner with Coalesce because we think it is similar to how we viewed Snowflake at the beginning of 2018. We view Coalesce as just making Snowflake even more amazing to use.

 

7) – The Data Mesh Concept(s) and Data Observability.

Love these concepts or hate them, they are taking hold within the overall data professionals’ brain trust. Zhamak Dehghani (previously at Thoughtworks) and ThoughtWorks from 2019 until now have succeeded in communicating to the market the concept of a Data Mesh.  Whereas, Barr Moses from Monte Carlo, has been beating the drum very hard on the concept of Data Observability. I’m highlighting these data concepts as trends that are aligned with improving Data to Value speed, quality, and accessibility.  There are many more data concepts besides these two.  Time will reveal which of these will gain mind and market share and which will go by the wayside.

 

Conclusion:

That is it for Frank’s Future of Data part 1 series article. In our second section, Part 2, we will continue exploring more trends that we should keep in mind, as well as exploring Snowflake’s announcements related to Data to Value.

What is a Snowflake Data Superhero?

What is a Snowflake Data Superhero? 

 

Currently, a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc.

Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community. Snowflake Data Superheroes get some decent DSH benefits as well, keep reading to learn more.

I’m Frank Bell, the founder of IT Strategists and Snowflake Solutions, and I’m also a Snowflake Data Superhero. In this article, I’d like to give you an overview of what a Snowflake Data Superhero is, what the program entails, and what are some of the benefits of being chosen as a DSH.

 

The Snowflake Data Superhero Program (Before Fall 2021)

 

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years.  I don’t even think there were an exact criteria list to be in it. Since I was a long-time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business), I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021.  Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am – 9am – Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program. Especially for many of us in the Snowflake Advocate Old Guard. (There are probably around 40 of us I’d say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires. Benoit did too but as I’ve stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture. As an engineer you have to give them some respect.)

 

The Snowflake Data Superhero Program (2022)

 

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts! They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

 

They mention that they look for the following key attributes:

 

  • You must overall be a Snowflake expert.
  • They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.
  • They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
  • They also want people who support other community members and provide feedback on the Snowflake product.
  • They want overall energetic and positive people

 

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above. This past year, since the program was new I also think it came down to that only certain people applied. (I think next year it will be less exclusive since the number of Snowflake experts is really growing from my view.  Back in 2018, there honestly was a handful of us. I would say less than 100 worldwide. Now there are most likely 200+ true Snowflake Data Cloud Experts outside of Snowflake Employees. Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert. The only way that I’m doing it (or trying to) is to employ many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions. I would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography. For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the 48 Snowflake Data Superheroes, there are a few that just stand out as producing an insane amount of free community content.  I’m going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

 

What benefits do you get when you become a Snowflake Data Superhero?

 

Snowflake Data Superhero Benefits:

 

In 2022, they also provided all of these benefits:

 

  • A ticket to the Snowflake Summit – I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake Corp decisions that are not customer or partner-focused, this was Snowflake Corporation actually doing something awesome, and really the right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake’s success (no stock, no salary).  While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
  • Snowflake Swag that is different (well, it was for a while, now others are buying the “kicks” or sneakers)
  • Early education on new Snowflake Features
  • Early access to new Snowflake Features (Private Preview)
  • Some limited opportunities to speak at events. (Let’s face it, the bulk of speaking opportunities these days goes in this order:  Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
  • VIP access to events (we had our own Data Superhero area within Snowflake Summit)
  • Actual Product Feedback sessions with the Snowflake Product Managers

 

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits.  Also, I do not think many of the DSHs have received their Data Superhero card. This was one of those benefits provided to maybe 10 or more of the DSHs back in 2019 or so.  Basically, anyone who was chosen to speak at Snowflake Build I believe is where some of it started.  I’m not 100% sure.

 

The Snowflake Data Superhero Program (2023)

 

How do I apply to be a Snowflake Data Superhero?
Here you go:  [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

 

Snowflake’s Data Superhero Program Evolution

 

I will add some more content around this as I review how the 2023 program is going to work.  I will say I have been surprisingly pleased with the DSH Program overall this year in 2022.  It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

 

Snowflake’s Data Superhero Program Internal Team

 

I also want to give a shout-out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program. These individuals and more have been wonderful to work with this year:

  • Howard Lio
  • Leith Darawsheh
  • Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

 

Other Snowflake Data Superhero Questions:

 

Here was the full list from Feb 2021.

Who are the Snowflake Data Superheroes?

https://medium.com/snowflake/introducing-the-2022-data-superheroes-ec78319fd000

 

Summary

 

I kept getting all of these questions about, hey – what is a Snowflake Data Hero?  What is a Snowflake Data Superhero?  How do I become a Snowflake Data Superhero?  What are the criteria for becoming one?

This article is my attempt to answer all of your Snowflake Data Superhero-related questions in one place. Coming from an actual Snowflake Data Superhero, I’ve been one for 3+ years in a row now. Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Introduction:

 

Today’s article provides a recap of the Snowflake Summit 2022, including the key feature announcements and innovations. We highlight the major takeaways from the event and the outline of Snowflake’s position as a full-stack business solution environment capable of creating business applications.

We also include a more in-depth discussion of Snowflake’s seven pillars of innovation, which include all data, all workloads, global, self-managed, programmable, marketplace, and governed.

 

Snowflake Summit 2022 Recap from a Snowflake Data Superhero:

 

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

Here are my top 20 announcements, mostly in chronological order of when they were announced. It was overwhelming to keep up with the number of announcements this week!

 

Cost Governance:

 

1. The concept of New Resource Groups has been announced. It allows you to combine all kinds of Snowflake data objects to monitor their resource usage. This is a huge improvement since Resource Monitors were previously quite primitive.

2. The concept of Budgets that you can track against. Resource Groups and Budgets coming into Private Preview in the next few weeks.

3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

 

Replication Improvements on SnowGrid:

 

4. Account Level Object Replication: Snowflake previously allowed only data replication and not other account-type objects. However, now all objects that are not just data can supposedly be replicated as well.

5. Pipeline Replication and Pipeline Failover: Now, stages and pipes can be replicated. According to Kleinerman, this feature will be available soon in Preview.

 

Data Management and Governance Improvements:

 

6. The combination of tags and policies. You can now do  —  Private Preview now and will go into public preview very soon.

 

Expanding External Table Support and Native Iceberg Tables:

 

7. We will soon have support for external tables in Apache Iceberg. Keep in mind, however, that external tables are read-only and have certain limitations. Take a look at what Snowflake did in #9 below.

8. Snowflake is broadening its abilities to manage on-premises data by partnering with storage vendors Dell Technologies and Pure Storage. The integration is anticipated to be available in a private preview in the coming weeks.

9. We are excited to announce that Snowflake now fully supports Iceberg tables, which means these tables can now support replication, time travel, and other standard table features. This enhancement will greatly improve the ease of use within a Data Lake conceptual deployment. For any further inquiries or assistance, our expert in this area is Polita Paulus.

 

Improved Streaming Data Pipeline Support:

 

10. New Streaming Data Pipelines. The main innovation is the capability to create a concept of materialized tables. Now you can ingest streaming data as row sets. Expert in this area: Tyler Akidau

  • Funny—I presented on Snowflake’s Kafka connector at Snowflake Summit 2019. Now it feels like ancient history.

 

Application Development Disruption with Streamlit and Native Apps:

 

11. Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.

12. Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

 

Expanded SnowPark and Python Support:

 

13. Python Support in the Snowflake Data Cloud. More importantly, this is a major move to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for all workloads including Machine Learning. Snowflake has been making efforts to simplify the process of running data scientist workloads within its platform. This is an ongoing endeavor that aims to provide a more seamless experience.

14. Snowflake Python Worksheets. This statement is related to the previous announcement. It enables data scientists, who are used to Jupyter notebooks, to more easily work in a fully integrated environment within Snowflake.

 

New Workloads. Cybersecurity and OLTP! boom!

 

15. CYBERSECURITY. This was announced a while back, but it is being emphasized again to ensure completeness.

16. UNISTOREOLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

 

Additional Improvements:

 

17. Snowflake Overall Data Cloud Performance Improvements. This is great, but with all the other “more transformative” announcements, I’ll group this together. The performance improvements include enhancements to AWS capabilities, as well as increased power per credit through internal optimizations.

18. Large Memory Instances. They did this to handle more data science workloads, demonstrating Snowflake’s ongoing commitment to meeting customers’ changing needs.

19. Data Marketplace Improvements. The Marketplace is one of my favorite things about Snowflake. They mostly announced incremental changes.

 

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

 

  1. Snowflake is positioning itself now way beyond a cloud database or data warehouse. It now is defining itself as a full-stack business solution environment capable of creating business applications.
  2. Snowflake is emphasizing it is not just data but that it can handle “all workloads” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, and Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams of data pipelines from Kafka, etc., and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

 

Snowflake at a very high level wants to:

 

  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development

 

Want more recap beyond just the features?

 

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Snowflake-related Growth Stats Summary:

  • Employee Growth:

2019:  938 Employees

2022 at Summit:  3992 Employees

  • Customer Growth:

2019:  948 Customers

2022 at Summit:  5944 Customers

  • Total Revenue Growth:

2019:  96M

2022 at Summit:  1.2B

 

Snowflake’s 7 Pillars of Innovations:

 

Let’s go through the 7 pillars of snowflake innovations:

  1. All Workloads – Snowflake is heavily focusing on creating an integrated platform that can handle all types of data and workloads, including ML/AI workloads through SnowPark. Their original architecture’s separation of computing and storage is still a key factor in the platform’s power. This all-inclusive approach to workloads is a defining characteristic of Snowflake’s current direction.
  2. Global – Snowflake, which is based on SnowGrid, is a fully global data cloud platform. Currently, Snowflake is deployed in over 30 cloud regions across the three main cloud providers. Snowflake aims to provide a unified global experience with full replication and failover to multiple regions, thanks to its unique architecture of SnowGrid.
  3. Self-managed – At Snowflake, we are committed to ensuring that our platform remains user-friendly and straightforward to use. This is our priority and we continue to focus on it.
  4. Programmable – Snowflake can now be programmed using not only SQL, Javascript, Java, and Scala, but also Python and its preferred libraries. This is where Streamlit comes in.
  5. Marketplace – Snowflake emphasizes its continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares). Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. Governed – Snowflake stated that they have a continuous heavy focus on data security and governance.
  7. All Data – Snowflake emphasizes that it can handle not only structured and semi-structured data, but also unstructured data of any scale.

 

Conclusion:

 

We hope you found this article useful!

Today’s article recapped Snowflake Summit 2022, highlighting feature announcements and innovations. Snowflake is a full-stack business solution environment with seven pillars of innovation: all data, all workloads, global, self-managed, programmable, marketplace, and governed. We covered various topics such as cost governance, data management, external table support, and cybersecurity.

If you want more news regarding Snowflake and how to optimize your Snowflake accounts, be sure to check out our blog.

Automated Modern Data Stack

Welcome! This article describes what a modern data stack is and how companies can leverage it to gain business insights. Building a proper modern data stack has become an essential element for data-driven companies seeking to thrive in today’s fast-paced digital world and data-driven world.

 

What is a Modern Data Stack?

The term ‘modern data stack’ was first coined in the mid-2010s to refer to a collection of cloud-based tools for managing and analyzing data.

A modern data stack essentially allows companies to gain valuable business insights by efficiently storing, integrating, and analyzing huge volumes of data from diverse sources. As data volumes grew exponentially, traditional data warehouses and business intelligence tools were no longer sufficient. The modern data stack emerged as a new approach to data management that could handle large, diverse datasets and support data-driven decision-making at scale.

 

What does a Modern Data Stack include?

A modern data stack includes many different components because each component serves a specific purpose in enabling companies to manage and gain insights from their data. The components work together in an integrated fashion to provide a full solution for data management and analytics. However, the components often vary based on a company’s specific needs and priorities.

 

Some of the most common components include:

  • Cloud data warehouses for scalable storage and computing.
  • Data integration platforms to ingest data from various sources.
  • Data transformation tools to prepare and model the data.
  • Business intelligence tools for analysis and visualization.
  • Data quality and governance tools to ensure data accuracy, security, and compliance.

Now that we’ve gone through a high-level overview of what a modern data stack is and what are some of the most common components, we are proud to present our robust modern data stack solution.

After thoroughly analyzing the leading options, we have assembled a set of technologies that we believe deliver an unparalleled experience.

 

Our ITS Automated Modern Data Stack:

 

The benefit of having your modern data stack automated is that it reduces the need for manual data engineering and integration. Automated tools handle the heavy lifting of data ingestion, transformation, and integration so that your data analysts and scientists can focus on deriving insights and business value from the data. Automation also increases speed, scalability, and reduces costs.

The layers of our automated modern data stack consist of companies with whom we have official partnerships (Snowflake, Fivetran, Coalesce, Hightouch, and Sigma).

 

Base Layer 0 – Snowflake

 

Firstly, a modern data stack needs a base layer like Snowflake because it provides the foundational data storage and computing infrastructure upon which the rest of the stack is built. We choose Snowflake’s cloud data warehouse because it can efficiently store huge volumes of data from diverse sources and run complex queries across all of it.

Our website name, ‘Snowflake Solutions,’ reflects our sole focus on Snowflake as our foundational technology. We have unparalleled expertise and a proven track record of delivering cutting-edge, customized Snowflake solutions for all of our clients. Our Founder, Frank Bell, is considered the top Snowflake optimization expert in the world and has been a leading pioneer of Snowflake’s infrastructure and optimization solutions.

 

What does Snowflake provide?

  • A scalable cloud data warehouse.
  • Separation of computing and storage.
  • A multi-cluster, shared data architecture.
  • Automated data loading and unloading.
  • Time travel for data correction.
  • Data sharing across accounts and organizations.

 

Base Layer 1 – Fivetran

 

Fivetran is a cloud-based data integration platform that helps organizations centralize data from various sources into a unified view. It automates the process of data integration, making it easier for businesses to access and analyze their data in real-time.

 

What does Fivetran provide?

 

  • It uses an ELT to quickly load your data into your warehouse prior to transforming it.
  • Fivetran normalized schemas replicate the data from your sources into the familiar relational database format, so analysts can immediately run queries on it.
  • Fivetran offers over 300+ pre-configured connectors for various data sources and they only take five minutes to set up.
  • Automated schema drift handling, updates, data normalization, and more.
  • Built-in automated governance and security features.
  • Real-time data movement with low impact on the source system.
  • Automated data entry and extraction across systems.
  • It avoids the high costs associated with data integration due to increased engineering resources.

 

Base Layer 2 – Coalesce

 

When we came across the full demonstration of Coalesce, we were blown away. In our view, they are one of the largest game-changers in recent years.

Coalesce is a data transformation tool specifically built for Snowflake that leverages a column-aware architecture. It provides a code-first, GUI driven experience for managing and building those transformations. Coalesce provides the first automated transformation data pipeline tool we have tested that scales with Snowflake and makes the transformation of data pipelines more automated.

 

What does Coalesce provide?

  • It’s easy to use when creating patterned transformations.
  • Extreme transformation flexibility at both object and pipeline levels. Combined code and GUI editing.
  • Automation templates that be shared with your data engineering team.
  • Coalesce separates the build and the deployment of data pipelines. Providing flexibility in testing your data pipeline.
  • You are able to use column-aware metadata for the automated creation of database objects including dimensions.
  • You can easily build data pipelines with Snowflake Streams and Tasks via Coalesce
  • You can quickly implement patterned transformations like Deferred Merge across hundreds or thousands of tables.
  • Built for true cloud scale as a cloud-first tool to operate on top of Snowflake.

 

Base Layer 3 – Hightouch

 

Hightouch can be a really powerful tool. Hightouch is the leading Data Activation platform that syncs data from your warehouse into 125+ SaaS tools with no engineering effort needed.

 

What does Hightouch provide?

 

  • Ease of use in inputting data and extracting value.
  • Amazing loading flexibility that allows you to sync your warehouse data to any SaaS tool with integrations with 100+ destinations.
  • Automation that does not require you to input custom code or use CSVs.
  • Security with cloud computing that never stores your data, and has several certifications that ensure security compliance and data governance: SOC 2 Type 2 Compliant, CDPR compliant, HIPAA compliant, and CCPA compliant.
  • Easy control over who has access & authorization to make changes.

 

Base Layer 4 – Sigma Computing

Sigma Computing is a cloud-based Business Intelligence (BI) platform that is used for data exploration and visualization. It increases speed to insights by using Snowflake’s lightning-fast computing power coupled with the familiarity of spreadsheets. It operates as a calculation engine on top of Snowflake and the best part is that Sigma never, (if done correctly) creates extracts.

 

What does Sigma provide?

  • Offers code-free and code-friendly data manipulation & visualization.
  • Ease of use with a familiar interface that is designed to look like a spreadsheet.
  • Drag & drop functionalities that improve user interactions and do not require any additional technical know-how or skill.
  • The only BI tool that was built for the Snowflake Cloud Platform. Therefore, it has speedy connections that reduce latency as users run queries.

 

Conclusion

Are you unsure about the best automated modern data stack for your business? Given how important choosing the right solution is, schedule a free call with us. We can walk you through the options to find what suits your needs best.

The powerful automated modern technology stack we outlined in this article is the one we employ for the vast majority of our projects. We wholeheartedly endorse all these partners and their solutions to our clients who are primarily transitioning to Snowflake.

Do you have any data automation needs we have not already addressed?

We hope this article proved useful in considering what a truly automated, modern data infrastructure should look like.

Be sure to check out blog for more information regarding Snoptimizer or Snowflake.

Snowflake’s Financial Services Data Summit Recap

Snowflake Financial Data Summit:

 

Snowflake held an excellent virtual event this week, focusing on Snowflake’s Data Cloud solutions for Financial Services. We appreciated the combination of business and technology content during this Industry Vertical Snowflake “Summit.” Our mission, when launching our ITS solutions business, was to provide Business/Technology Focused Solutions.

We firmly believe in bringing the best of both worlds, where business teams and technology teams work together. We wanted to avoid the pitfalls of solutions that were solely business-focused, without considering technology or collaboration. We also sought to avoid technology solutions that lacked business value or had limited value, which led to the product/market fit concepts.

 

Financial Services Data Summit Highlights and Take Aways

 

  • Major emphasis on the Snowflake Financial Services Data Cloud and its partners such as BlackRock, Invesco, State Street, Fiserv, etc.
  • Financial Services Data Provider Presentations. During the conference, we were excited to attend the Data Provider Presentations from Acxiom. The presentations provided us with valuable insights into the data industry and the latest trends in data collection and analysis. Acxiom’s experts shared their experiences and knowledge about the challenges and opportunities of working with large datasets, as well as the best practices for data management and security.
  • It is all about the customer. This was a recurring topic among Snowflake customers and partners such as Blackrock, State Street, and data providers. They highlighted the strong partnership with Snowflake and how it has facilitated new data collaborations that were previously unattainable.

 

Financial Services Data Summit By the Numbers:

 

  • Sessions – 17 Tracks – 4 [Customer Centricity – Risk and Data Governance – Digitalize Operations – Platform Monetization] Speakers – 44 Speakers by type.

 

Presenter Company Type Count Breakdown %
Snowflake 16 37.21%
Customers 9 20.93%
Partners – Consulting 6 13.95%
Partners – Data Providers 7 16.28%
Partners – Products 5 11.63%

 

Data Workload Financial Services Recap:

 

Data Warehousing, Engineering, Data Lakes:

 

We still think this Data Warehousing is the workload that works best with Snowflake and what it was originally designed for.  We see many businesses within Financial Services moving to the Snowflake Data Cloud for their Data Warehouse workloads.  Many of the Financial Services companies who presented at the summit also are moving to a combination of Data Lakes and Data Warehousing. The presentations focused on a mix of financial services and processes combined with data technologies to improve the financial services business. Snowflake’s data cloud is accelerating how financial services companies transform. Capital One shared an interesting video about being the first to deliver financial services on the cloud, which was not included in the sessions.

 

Data Science and Data Applications:

 

Many of the presentations at the Financial Data Summit were related to building data applications on top of Snowflake.

 

Data Marketplace and Data Exchanges:

From our perspective, the focus of the Financial Services Summit was the Financial Services Data Cloud. The main highlight of Blackrock’s Aladdin Data Cloud and Q&A around that was one of the main presentations.  There was also a very large focus on Financial Services Data Providers of Acxiom, Intelligence, FactSet, etc., and on the Data Cloud Data and Services they provide.

 

Snowflake Data Provider Presentations:

 

Acxiom – Recording Link

Intelligent & FactSet – Recording Link

S&P Global – Recording Link

 

Financial Services Data Cloud Announced:

 

Besides the summit event, Snowflake announced the Snowflake Financial Services Data Cloud as well. We view this as really just a subset of the overall Snowflake Data Cloud vision and definition. We assume Snowflake will continue to roll out industry vertical conceptual concepts of Data Clouds. This is super interesting and transformative at many levels. It is a massive movement to more centralized and shared data versus the historical data silos that have developed within companies.

This is the statement from the press release: “Financial Services Data Cloud, which unites Snowflake’s industry-tailored platform governance capabilities, Snowflake- and partner-delivered solutions, and industry-critical datasets, to help Financial Services organizations revolutionize how they use data to drive business growth and deliver better customer experiences.”

At a high level, this is pretty awesome ****“theoretically” and aligns with a lot of the thought leadership work I’m doing around moving from a paradigm of on-premise closed data systems and silos to an ever-evolving worldwide concept of integrated data.

 

Conclusion:

 

The Snowflake Financial Services Data Summit was an excellent first Vertical Industry Summit with major Financial Services Customers and Partners such as BlackRock, State Street, Invesco, Fiserv, Square, NYSE, Wester Union, Acxiom, etc.  Our favorites [from a practical learning perspective] were:

  1. Fiserv CTO Marc did a great job in this presentation demonstrating Fiserv Applications/Tools on top of Snowflake. Recording Link

2.  Building on Snowflake: Driving Platform Monetization with the Data Cloud. Recording Link

There were many many other great presentations though as well from Providers and Snowflake partners like Alation, AWS, etc.

Snowflake’s Differentiating Features

What are the features of Snowflake that differentiate it from all its competitors?  I started this list in 2018 and it continues to evolve. Sure, I am a Snowflake Data Superhero and longtime Snowflake Advocate. I do try to be objective though.  Also, I have had a long long career of partnering with new technologies during my 19 years of running a successful consulting firm.  I have to state that most vendors and technologies do NOT impress me at all.  While I partnered with Microsoft (we were a gold partner for many years) and many others, the reality is that most of their technology was not a game-changer like an internet or Netscape (the first browser). They typically were solid technology solutions that helped our clients.  When I discovered Snowflake at the beginning of 2018 when looking to build a custom CDP for a Fortune 50 company I realized this technology and this architecture was going to be a game changer for the data processing industry, especially within BIG DATA and ANALYTICS.  

Snowflake’s Differentiating Features (2018 or before)

  1. Concurrency and Workload Separation [enabled by the Separation of Compute from Storage.]  [huge! for the first time, you could completely separate workloads and not have the traditional concurrency challenges of table locking or ETL jobs COMPETING with Reporting or Data Science jobs.]
  2. Pay-as-you-go pricing (also, named Consumption-based pricing) – This enabled for the very first time that startups and medium-sized businesses could get true cloud BIG DATA scale at an amazingly affordable price.  This never happened before this.
  3. Time-Travel.  (Based on write-ahead Micro-partitions.)
  4. Zero-Copy Cloning.
  5. True Cloud Scale.  DYNAMIC (the way it should be!) In and Out Scaling with Clusters.
  6. True Cloud Scale.  Up and Down.  [code or manual still at this point.  Switching between XS to 4XL warehouse t-shirt sizes]
  7. Data Sharing (this may be my favorite feature.  Data Sharing is transforming industries)
  8. Snowpipe.  The ability to handle the ingestion of streaming data in near real-time.
  9. Data Security.  Encrypted Data from end-to-end.  While some other vendors had some of this Snowflake made security first in the cloud.
  10. Semi-Structured Data ease of use.  Snowflake has been the easiest way we have been able to have JSON and other
  11. Lower Database Administration.   Amazingly, no database vendor didn’t automate the collection of database/query statistics and automated indexing/pruning before.  Huge STEP forward.   [I DO NOT agree with Near-Zero Administration – this is not true especially as Snowflake transformed to a data cloud and added on tons and tons of additional features which have some additional administration requirements]

Snowflake’s Differentiating Features (2019-2021)

  1. Data Exchange and then Data Marketplace.
  2. Cloud Provider Agnostic. Move to support Azure as well as GCP in addition to AWS.
  3. Data Clean Room V1. Capability to use Secure User Defined Functions within Data Shares.
  4. Data Governance Capabilities.
  5. Integrated Data Science with Snowpark. [still needs work!]
  6. Unstructured data. Amazingly now

Snowflake’s Differentiating Features (2022)

*I’m going to wait until December 2022 to finalize this list.  There were some amazing announcements.

One item though that I”m finding awesome is  Access to the SNOWFLAKE.ORGANIZATION_USAGE Schema (I think it’s still in preview but this makes Organizational reporting so much easier.  Previously we build tools that would log into each account and go to the SNOWFLAKE.ACCOUNT_USAGE schema views within each count and pulls it back to a centralized location.  Sure it worked but it was a pain.

To be fair and not a complete Snowflake Advocate, Snowflake needs a reality check right now.  Snowflake Summit 2022 was an amazing amount of announcements.  (Even though a focused business person could argue… what is Snowflake now?  A Data Cloud?  A data application development environment?  A Data Science and ML tool?  My heart goes out to the Account Executives.  They have to focus first when they do capacity deals on the true value of what Snowflake provides today!)   Also, the true reality is many of the significant announcements remind me of my Microsoft Gold Partner days…. lots of Coming Soon…. but not that soon.  Many of these feature announcements will not be truly available until 2023.

Snowflake’s Differentiating Features (2023)

Coming next year :). you just have to wait!

Summary

Since 2018, I was getting questions from so many colleagues and customers about why is Snowflake better than the on-prem databases they were using.  Or I was getting tons of questions about why Snowflake is different than Redshift or Big Query or Synapse.  

So this article is my attempt to explain to both business users of data and data professionals (from architects to analysts) why Snowflake is different from any other technology.