Shortest Snowflake Summit 2023 Recap

Introduction:

Similar to last year, I wanted to create a “shortest” recap of the Snowflake Summit 2023, including the key feature announcements and innovations.  This is exactly 2 weeks after Snowflake Summit has ended and I have digested the major changes.  Throughout July and August we will follow up with our view of the massive Data to Value improvements and capabilities being made.

 

Snowflake Summit 2023 Recap from a Snowflake Data Superhero:

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

 

Top Announcements:

 

 1.Native Applications goes to Public Preview: 

I am slightly biased here because my teams have been working with Snowflake Native Apps since Feb/March 2022. We have been on the journey with Snowflake from early early Private Preview to now over the last 16 months or so.  We are super excited about the possibilities and potential of where this will go.  

 

2. Nvidia/Snowflake Partnership, Support of LLMs, and Snowpark Container Services (Private Preview):   

Nvidia and Snowflake are teaming up (because as Frank S. says… some people are trying to kill Snowflake Corp) and they will integrate Nvidia’s LLM framework into Snowflake. I’m also really looking forward to seeing how these Snowpark Container Services work.

 

3. Dynamic Tables (Public Preview):  

Many Snowflake customers including myself are really excited about this.  These allow new Data Set related key features beyond a similar concept like Materialized Views. With Dynamic Tables you can… have declarative data pipelines, dynamic SQL Support , user defined low latency freshness, automated incremental refreshes, and snapshot isolation.

 

4. Managed Iceberg Tables (Private Preview): 

“Managed iceberg tables” allows Snowflake Compute Resources to manage Iceberg data.  This really helps with easier management of iceberg format data and helps Snowflake compete for Data Lake or really large Data File type workloads. So Snowflake customers can manage their data lake catalog with Iceberg BUT still get huge value with better compute performance with Snowflake’s query engine reading the metadata that Iceberg provides.  In some ways this is a huge large file data to value play.  It enables what blob storage (S3, Azure, etc.) do best BUT then being able to utilize Snowflake’s compute means less DATA TRANSFORMATION and faster value from the data including dealing with standard data modifications like updates, deletes and inserts.

 

5. Snowpipe Streaming API (Public Preview): 

As someone that worked with and presented on the Kafka Streaming Connector back at Summit 2019 it is really great to see this advancement. Back then the connector was “ok”. It could handle certain levels of streaming workloads. 4 years later this streaming workload processing has gotten much much better.

 

Top Cost Governance and Control Changes:

As anyone who has read my blog over the past few years, I’m a huge advocate of the Snowflake pay for what you use is AWESOME but ONLY when tools like our Snoptimizer® Optimization tool is used or you really really setup all the cost guard rails correctly.  98% of accounts we help with Snoptimizer do not have all the optimizations set correctly.  Without continuous monitoring of costs (and for that matter performance and security – which we also offer unlike a lot of the other copycats).

1. Budgets (Public Preview): 

This “budget” cost control feature was actually announced back in June 2022.  We have been waiting for it for some time now.  It is good to see Snowflake finally delivering this functionality. Since we started as one of the top Snowflake Systems Integrators back in 2018 there has been ONLY Resource Monitors to have ANY control whatsoever with guardrail limit type functionality.  This has been a huge pain point for many customers for many years.  Now, with this budget feature, users can actually specify a budget and get much more granular details about their spending limits.

2. Warehouse Utilization (Private Preview): 

This is another great step forward for Snowflake customers looking to optimize their Snowflake warehouse utilization.  We already leverage meta data statistics that are available to do this within Snoptimizer® but we are limited by the level of detail we can gather. This will allow us to optimize workloads much better across Warehouses to get even higher Snowflake Cost Optimization for our customers.

 

My takeaways from Snowflake Summit 2023:

  • If you would like more content and my summaries are not enough details then you are in luck. Here are more details from my team on our top findings around Snowflake Summit 2023.
  • Snowpark Container Services allow Snowflake customers to now run any job, function or service — from 3rd party LLMs, to Hex Notebook to a C++ application to even a full database, like Pinecone, in users’ own accounts. It now supports GPUs.
  • Streamlit is getting a new faster and easier user interface to develop apps. It is an open-source Python-based framework compatible with major libraries like sci-kit-learn, PyTorch, and Pandas. It has Git integration for branching, merging, and version control.
  • Snowflake is leveraging two of its recent acquisitions — Applica and Neeva to provide a new Generative AI experience. The former acquisition has led to Document AI, an LLM that extracts contextual entities from unstructured data and queries unstructured data using natural language. The unstructured to structured data is persisted in Snowflake and vectorized. Not only can this data be queried in natural language, but it can also be used to retrain the LLM on private enterprise data. While most vendors are pursuing prompt engineering. Snowflake is following the retraining path.
  • Snowflake now provides full MLOps capabilities, including Model Registry, where models can be stored, version controlled, and deployed. They are also adding a feature store with compatibility with open-source Feast. It is also building LangChain integration.
  • Last year, Snowflake added support for Iceberg Tables. This year it brings the tables under its security, governance, and query optimizer umbrella. Iceberg table’s performance now matches the tables’ query latency in native format.
  • Snowflake is addressing the criticism of its high cost through several initiatives designed to make costs predictable and transparent. Snowflake Performance Index (SPI) — using ML functions, it analyzes query durations for stable workloads and automatically optimizes them. This has led to 15% improvement on customers’ usage costs.
  • Snowflake has invested hugely in building native data quality capabilities within its platform. Users can define quality check metrics to profile data and gather statistics on column value distributions, null values, etc. These metrics are written to time-series tables which helps build thresholds and detects anomalies from regular patterns.
  • Snowflake announced two new APIs to support the ML lifecycle:
  • ML Modeling API: The ML Modeling API includes interfaces for preprocessing data and training models. It is built on top of popular libraries like Scikit Learn and XGBoost, but seamlessly parallelizes data operations to run in a distributed manner on Snowpark. This means that data scientists can scale their modeling efforts beyond what they could fit in memory on a conventional compute instance.
  • MLOps API: The MLOps API is built to help streamline model deployments. The first release of the MLOps API includes a Model Registry to help track and version models as they are developed and promoted to production.
  • Improved Apache Iceberg integrations
  • GIT Integration: Native git integration to view, run, edit, and collaborate within Snowflake code that exists in git repos. Delivers seamless version control, CI/CD workflows, and better testing controls for pipelines, ML models, and applications.
  • Top-K Pruning Queries: Enable you to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, help reduce the need to scan across entire data sets, thereby enabling faster searches. (SELECT ..FROM ..TABLE ORDER BY ABC LIMIT 10).
  • Warehouse Utilization: A single metric that gives customers visibility into actual warehouse utilization and can show idle capacity. This will help you better estimate the capacity and size of warehouses.
  • Geospatial Features: Geometry Data Type, switch spatial system using ST_Transformation, Invalid shape detection, many new functions for Geometry and Geography
  • Dynamic Tables
  • Amazon S3-compatible Storage
  • Passing References for Tables, Views, Functions, and Queries to a Stored Procedure — Preview

 

Marketplace Capacity Drawdown Program

Anomaly Detection: Flags metric values that differ from typical expectations.

Contribution Explorer: Helps you find dimensions and values that affect the metric in surprising ways.

 

What did happen to Unistore? 

 

UNISTORE. OLTP type support based on Snowflake’s Hybrid Table features: This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

This was from last year too – it’s great to see this move forward!  (even though..Streamlit speed is still a work in progress)

 

Application Development Disruption with Streamlit and Native Apps:

 

Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.

Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

Snowflake at a very high level (still) wants to:

Disrupt Data Analytics

Disrupt Data Collaboration

Disrupt Data Application Development

Data to Value – Part 2

Introduction:

Welcome to our part 2 Data to Value series. If you’ve read Part 1 of the Data to Value Series, you’ve learned about some of the trends happening within the data space industry as a whole.

In Part 2 of the Data to Value series, we’ll explore additional trends to consider, as well as some of Snowflake’s announcements in relation to Data to Value.

As a refresher on this series, we are making a fundamental point that data professionals and data users of all types need to be focused not just on creating, collecting, and transforming data. We need to make a cognizant effort to focus on and measure what is the true value that each set of data creates. Also, we need to measure, how fast we can get to that value if it provides any real business advantages. There is an argument to also alter the value of the data that is time-dependent since it loses value sometimes the older it is.

 

Data to Value Trends – Part 2:

 

8) – Growth of Fivetran and now Hightouch.

The growth and success of Fivetran and Stitch (now Talend) has been remarkable. There is now a significant surge in the popularity of automated data copy pipelines that work in the reverse direction, with a focus on Reverse ETL (Reverse Extraction Transformation and Load), much like our trusted partner, Hightouch. Our IT Strategists consulting firm became partners with Stitch, Fivetran, and Matillion in 2018.

At the Snowflake Partner Summit of the same year, I had the pleasure of sitting next to Jake Stein, one of the founders of Stitch, on the bus from San Francisco to Sonoma. We quickly became friends, and I was impressed by his entrepreneurial spirit. Jake has since moved on to a new startup, Common Paper, a structured contracts platform, after selling Stitch to Talend. At the same event, I also had the opportunity to meet George Frazier from Fivetran, who impressed me with his post comparing all the cloud databases back in 2018. At that time, such content was scarce.

 

9) – Resistance to “ease of use” and “cost reductions” is futile.

Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database).  As I tested out Stitch and Fivetran though, I knew that resistance was futile. The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked. There was no way to stop the data market from embracing these easier-to-use data pipeline automation tools.

What was even more compelling is you can set up automated extract and load jobs within minutes or hours most of the time. This is unlike any of the previous ETL tools we have been using for decades which were mostly software installations. These installations took capacity planning, procurement, and all sorts of organizational business friction to even get started at all. With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work. In some cases, it can be beneficial to have the expertise of data engineers and architects involved.

Overall, the concept is simple: connecting destinations and connectors to facilitate Fivetran. Destinations refer to databases or data stores. Connectors are sources of data, such as Zendesk, Salesforce, or one of the many other connectors in Fivetran. Fivetran and Hightouch are great examples of trends in data services and tools that really speed up the process of getting value from your data.

 

10) – Growth of Automated and Integrated Machine Learning Pipelines with Data.

Many companies, including Data Robot, Dataiku, H2O, and Sagemaker, are working to achieve this goal. However, this field appears to be in its early stages, with no single vendor having gained widespread adoption or mindshare. Currently, the market is fragmented, and it is difficult to predict which of these tools and vendors will succeed in the long run.

 

Snowflake’s Announcements related to Data to Value

Snowflake is making significant investments and progress in the field of data analysis, with a focus on delivering value to its clients. Their recent announcements at the Snowflake Summit this year, as detailed in this source, highlight new features that are designed to enhance the Data to Value experience.

 

Snowflake recently announced its support of Hybrid Tables and the concept of Unistore.

This move is aimed at providing Online Transaction Processing (OLTP) to its customers. There has been great interest from customers in this concept, which allows for a single source of truth through web-based OLTP-type applications operating on Snowflake with Hybrid tables.

 

Announcements about Snowflake’s Native Apps:

 

  • Integrating Streamlit into Snowflake.

If done correctly, this could be yet another game-changer in turning data into value.
Please note that these two items mentioned not only enable data to be processed more quickly, but also significantly reduce the cost and complexity of developing data apps and combining OLTP/OLAP applications. This removes many of the barriers that come with requiring expensive, full-stack development. Streamlit aims to simplify the development of data applications by removing the complexity of the front-end and middle-tier components. (After all, aren’t most applications data-driven?) It is yet another low-code data development environment.)

  • Announcement of Snowpipe streamlining.

I found this particularly fascinating, as I had collaborated with Isaaic from Snowflake before the 2019 Summit using the original Kafka to Snowflake Connector. At Snowflake Summit 2019, I also gave a presentation on the topic. It was truly amazing to witness Snowflake refactor the old Kafka connector. As a result, there were significant improvements in speed and lower latency. This is yet another major victory for streamlining data to improve value, with an anticipated 10 times lower latency. The public preview is slated for later in 2022.

  • Announcement: Snowpark for Python and Snowpark in General

Snowflake has recently introduced a new technology called Snowpark. While the verdict is still out on this new technology, it represents a major attempt by Snowflake to provide ML pipeline data with increased speed. Snowflake is looking to integrate full data event processing and machine learning processes within Snowflake itself.

 

If Snowflake can execute this correctly, it will revolutionize how we approach data value. Additionally, it reduces the costs associated with deploying data applications.

 

Conclusion:

In part 2 of the “Data to Value” series, we explored additional trends in the data industry, including the growth of automated data copy pipelines and integrated machine learning pipelines. We also discuss Snowflake’s recent announcements related to data analysis and delivering value to clients, including support for hybrid tables and native apps. The key takeaway is the importance of understanding the value of data and measuring the speed of going from data to business value.

Executives and others who prioritize strategic data initiatives should make use of Data to Value metrics. This helps us comprehend the actual value that stems from our data creation, collection, extraction, transformation, loading, and analytics. By doing so, we can make better investments in data initiatives for our organizations and ourselves. Ultimately, data can only generate genuine value if it is reliable and of confirmed quality.

Snowflake Snowday – Data to Value Superhero Summary

Snowflake Snowday  —  Summary

Snowflake’s semiannual product announcement, Snowflake Snowday, took place on November 7, 2022, the same day as the end of Snowflake’s Data Cloud World Tour (DCWT).

I attended 5 DCWT events across the globe in 2022. It was fascinating to see how much Snowflake has grown since the 2019 tour. Many improvements and new features are being added to the Snowflake Data Cloud. It’s hard to keep up! These announcements should further improve Snowflake’s ability to turn data into value.

Let’s summarize the exciting Snowflake announcements from Snowday. The features we’re most enthusiastic about that improve Data to Value are:

  • Snowflake’s Python SDK (Snowpark) is now generally available.
  • Private data sharing significantly accelerates collaborative data work.
  • The Snowflake Kafka connector, dynamic tables, and Snowpipe streaming enable real-time data integration.
  • Streamlit integration simplifies dashboard and app development.

All of these features substantially improve Data to Value for organizations.

Snowflake Snowday Summary – Top Announcements

TOP announcement! – whoop whoop – SNOWPARK FOR PYTHON! (General Availability – GA)

I believe this was the announcement all Python data scientists were anticipating (including myself). Snowpark for Python now enables every Snowflake customer to develop and deploy Python-based apps, pipelines, and machine-learning models directly in Snowflake. In addition to Snowpark for Python being Generally Available to all Snowflake editions, these other Python-related announcements were made:

  • Snowpark Python UDFs for unstructured data (Private Preview)
  • Python Worksheets – The improved Snowsight worksheet now supports Python so you don’t need an additional development environment. This simplifies getting started with Snowpark for Python development. (Private preview)

One Product. One Platform.

  • Snowflake’s major push is to make its platform increasingly easy to use for most or all of its customers’ data cloud needs.
  • Snowflake now offers Hybrid Tables for OLTP workloads and Snowpark. Snowflake is expanding its core platform to handle AI/ML and online transaction processing (OLTP) workloads. This significantly increases Snowflake’s total addressable market.
  • Snowflake acquired Streamlit earlier this year for a main reason. They aim to integrate Streamlit’s data application frontend and backend. They also want to handle data application use cases.
  • Snowflake is investing heavily to evolve from primarily a data store to a data platform for building frontend and backend data applications. This includes web/data apps needing millisecond OLTP inserts or AI/ML workloads.

Additionally, Snowflake continually improves the core Snowflake Platform in the following ways:

The Cross-Cloud Snowgrid:

https://snowflakesolutions.net/wp-content/uploads/Snowday-Cross-Cloud-Snowgrid-1024x762.png

Replication Improvements and Snowgrid Updates:

These improvements and enhancements to Snowflake, the cross-cloud data platform, significantly boost performance and replication. If you’re unfamiliar with Snowflake, we explain what Snowgrid is here.

  • Cross-Cloud Business Continuity – Stream & Task Replication (PUBLIC PREVIEW) – This enables seamless pipeline failover, which is fantastic. It takes replication beyond just accounts, databases, policies, and metadata.
  • Cross-Cloud Business Continuity – Replication GUI (PRIVATE PREVIEW). You can now more easily manage replication and failover from a single interface for global replication. It enables easy setup, management, and failover of an account.
  • Cross-Cloud Collaboration – Discovery Controls (PUBLIC PREVIEW)
  • Cross-Cloud Collaboration – Cross-Cloud Auto-Fulfillment (PUBLIC PREVIEW)
  • Cross-Cloud Collaboration – Provider Analytics (PUBLIC PREVIEW)
  • Cross-Cloud Governance – Tag-Based Masking (GA)
  • Cross-Cloud Governance – Masking and Row-Access Policies in Search Optimization (PRIVATE PREVIEW)
  • Replication Groups – Looking forward to updates on this as well. These can enable sharing and simple database replication in all editions.
  • The above are available in all editions EXCEPT:
  • Enterprise or higher needed for Failover/Failback (including Failover Groups)
  • Business Critical or higher needed for Client Redirect functionality

Performance Improvements on Snowflake Updates:

New performance improvements and performance transparency were announced were related to:

  • Query Acceleration (public preview): Speeds up search queries.
  • Search Optimization Enhancements (public preview): Improves search relevance and precision.
  • Join eliminations (GA): Removes unnecessary table joins.
  • Top results queries (GA): Returns the most relevant search results.
  • Cost Optimizations: Account usage details (private preview): Reduces search costs.
  • History views (in development): Provides search query history.
  • Programmatic query metrics (public preview): Offers API for search analytics. Available on all editions EXCEPT: ENTERPRISE OR HIGHER REQUIRED for Search Optimization and Query Acceleration

Data Listings and Cross-Cloud Updates

I’m thrilled about Snowflake’s announcement regarding Private Listings. Many of you know that Data Sharing, which I’ve been writing about for over 4 years, is one of my favorite Snowflake features. My latest article is “The Future of Data Collaboration.” Data Sharing is a game-changer for data professionals.

Snowflake’s announcement makes private data-sharing scenarios much easier to implement. Fulfilling different regional requirements is now simpler too (even 1-2 years ago, we had to write replication commands). I’ll provide more details on how this simplifies data sharing and collaboration. I was happy to see presenters use the Data to Value concepts in their announcement.

I appreciated Snowflake incorporating some of my Data to Value concepts, like “Time to value is significantly reduced for the consuming party.” Even better, this functionality is now available for ALL SNOWFLAKE EDITIONS.

Private Listings (Get a crisper-looking visual)

https://snowflakesolutions.net/wp-content/uploads/Snowday-Listings-Cross-Cloud-Improvements-300x190.png

Snowflake Data Governance Improvements

All Snowflake features enable native data governance and protection.

  • Tag-based Masking automatically applies designated policies to sensitive columns using tags.
  • Search Optimization now supports tables with masking and row access policies.
  • FedRAMP High for AWS Government (authorization pending). *Available ONLY on ENTERPRISE+ OR HIGHER

Building on Snowflake

New announcements related to:

  • Streamlit integration (PRIVATE PREVIEW in January 2023) – This integration will be exciting. The private preview can’t come soon enough.
  • Snowpark Optimization Warehouses (PUBLIC PREVIEW) – This was a smart move by Snowflake to support AI/ML Snowpark customers’ needs. Great to see it rolled out, allowing customers access to higher memory warehouses better suited for ML/AI training scale. Snowpark code can run on both warehouse types.
  • *Available for all Snowflake Editions

Streaming and Dynamic Table Announcements:

Conclusion:

Overall, I’m thrilled with where this is headed. These enhancements greatly improve Snowflake’s streaming data integration, especially with Kafka. Now, Snowflake customers can get real-time data streams and transform data with low latency. When fully implemented, this will enable more cost-effective and high-performance data lake solutions.

If you missed Snowday and want to watch the recording, here’s the link: https://www.snowflake.com/snowday/agenda/

We’ll cover more updates from Snowday and Snowflake BUILD in depth this week in the Snowflake Solutions Community.

Data to Value – Part 1 – Snowflake Solutions

Introduction:

 

Welcome to our Frank’s Future of Data four-part series. In these articles, we will cover a few tips on how to get value out of your Snowflake data.

I spend a ton of time reviewing and evaluating all the ideas, concepts, and tools around data, data, and data. The “data concept” space has been exploding with an increase in many different concepts and ideas. There are so many new data “this” and data “that” tools as well so I wanted to bring data professionals and business leaders back to the core concept that matters around the creation, collection, and usage of data. Data to Value.

In layman’s terms, the main concept is that we need to remember that the entire point of collecting and using data is to create business, organizational, and/or individual value. This is the core principle that we should keep in mind when contemplating the value that data provides.

The truth is that while the technical details and jargon involved in creating and collecting data, as well as realizing its value, are important, many users find them overly complex.

For a moment, let’s set aside the technical jargon that can be overused and misused, such as Data Warehouse, Data Lake, Data Mesh, and Data Observability. I’ve noticed that data experts and practitioners often have differing views on the latest concepts. These views can be influenced by their data education background and the types of technologies they were exposed to.

Therefore, I created these articles to prepare myself for taking advantage of new paradigms that Snowflake and other “Modern Data” Stack tools/clouds provide.

On Part 1 of the Data to Value series we will cover the Data to Value trends you need to be aware of.

 

Data to Value Trends:

 

In 2018, I had the opportunity to consult with some highly advanced and mature data engineering solutions. Some of these solutions were actively adopting Kafka/Confluent to achieve true “event-driven data processing”. This represented a significant departure from the traditional batch processing that had been used in 98% of the implementations I had previously encountered. I found the idea of using continuous streams of data from different parts of the organization, delivered via Kafka topics, to be quite impressive. At the same time, these concepts and paradigm shifts were quite advanced and likely only accessible to very experienced data engineering teams.

1) – Non-stop push for faster speed of Data to Value.

Within our non-stop dominantly capitalist world, faster is better and often provides advantages to organizations, especially around improved value chains and concepts such as supply chains.  Businesses and organizations continuously look for any advantage they can get. I kinda hate linking to McKinsey for backup but here it goes. Their number 2 characteristic for the data-driven enterprise of 2025 is “Data is processed and delivered in real-time”.

 

2) – Data Sharing.

More and more Snowflake customers are realizing the massive advantage of data sharing allowing them to share “no-copy,” in-place data in near real-time.  Data Sharing is a massive competitive advantage if set up and used appropriately. You can securely provide or receive access to data sets and streams from your entire business or organization value chain which is also on Snowflake. This allows for access to data sets at reduced cost and risk due to the micro-partitioned zero-copy securely governed data access.

 

3) – Creating Data with the End in Mind.

When you think about using data for value and logically think through the creation and consumption life cycle then data professionals and organizations are realizing there are advantages to capturing data in formats that are ready for immediate processing.  If you design your data creation and capture as logs of data or other outputs that can be easily and immediately consumed you can gain faster data-to-value cycles creating competitive advantages with certain data streams and sets.

 

4) – Automated Data Applications.

I see some really big opportunities with Snowflake’s Native Applications and Streamlit integrated. Bottom-line, there is a need for consolidated “best-of-breed” data applications that can have a low-cost price point due to massive volumes of customers.

 

5) – Full Automated Data Copying Tools.

The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth in automated data copy pipelines going the other way like Hightouch.  At IT Strategists, we became a partner with Stitch, Fivetran, and Matillion back in 2018.

 

6) – Full Automation of Data Pipelines and more integrated ML and Data Pipelines.

With the introduction of a fully automated data object and pipeline service at Coalesce, we saw for the first time that data professionals improve Data to Value through fully automated data objects and pipelines. Some of our customers are referring to parts of Coalesce as a Terraform-like product for data engineering. What I see is a massive removal of data engineering friction similar to what Fivetran and Hightouch did but at a separate area of the data processing stack. We have become an early partner with Coalesce because we think it is similar to how we viewed Snowflake at the beginning of 2018. We view Coalesce as just making Snowflake even more amazing to use.

 

7) – The Data Mesh Concept(s) and Data Observability.

Love these concepts or hate them, they are taking hold within the overall data professionals’ brain trust. Zhamak Dehghani (previously at Thoughtworks) and ThoughtWorks from 2019 until now have succeeded in communicating to the market the concept of a Data Mesh.  Whereas, Barr Moses from Monte Carlo, has been beating the drum very hard on the concept of Data Observability. I’m highlighting these data concepts as trends that are aligned with improving Data to Value speed, quality, and accessibility.  There are many more data concepts besides these two.  Time will reveal which of these will gain mind and market share and which will go by the wayside.

 

Conclusion:

That is it for Frank’s Future of Data part 1 series article. In our second section, Part 2, we will continue exploring more trends that we should keep in mind, as well as exploring Snowflake’s announcements related to Data to Value.

What is a Snowflake Data Superhero?

What is a Snowflake Data Superhero? 

 

Currently, a Snowflake Data Superhero (abbreviated as DSH) is a Snowflake product expert who is actively involved in the Snowflake community and is helping others learn more about Snowflake through blogs, videos, podcasts, articles, books, etc.

Snowflake states it chooses DSHs based on their positive influence on the overall Snowflake Community. Snowflake Data Superheroes get some decent DSH benefits as well, keep reading to learn more.

I’m Frank Bell, the founder of IT Strategists and Snowflake Solutions, and I’m also a Snowflake Data Superhero. In this article, I’d like to give you an overview of what a Snowflake Data Superhero is, what the program entails, and what are some of the benefits of being chosen as a DSH.

 

The Snowflake Data Superhero Program (Before Fall 2021)

 

For those of you new to Snowflake within the last few years, believe it or not, there was this really informal Data Superhero program for many years.  I don’t even think there were an exact criteria list to be in it. Since I was a long-time Snowflake Advocate and one of the top Snowflake consulting and migration partners from 2018-2019 with IT Strategists (before we sold the consulting business), I was invited to be part of the informal program back in 2019.

Then those of us who had been involved with this informal program got this mysterious email and calendar invite in July 2021.  Invitation: Data Superhero Program Restructuring & Feedback @ Mon Jul 26, 2021 8am – 9am – Honestly, when I saw this and attended the session this sounded like it was going to be a pain in the ass having to validate our Snowflake expertise again within this new program. Especially for many of us in the Snowflake Advocate Old Guard. (There are probably around 40 of us I’d say who never decided to switch to be Snowflake employees of Snowflake Corporate to make a serious windfall as the largest software IPO in history (especially the Sloot and Speiser who became billionaires. Benoit did too but as I’ve stated before, Benoit, Thierry, and Marcin deserve some serious credit for the core Snowflake architecture. As an engineer you have to give them some respect.)

 

The Snowflake Data Superhero Program (2022)

 

This is a combination of my thoughts and the definitions from Snowflake.

Snowflake classifies Snowflake Data Superheroes (DSH) as an elite group of Snowflake experts! They also think the DSHs should be highly active in the overall Snowflake community. They share feedback with Snowflake product and engineering teams, receive VIP access to events, and their experiences are regularly highlighted on Snowflake Community channels. Most importantly, Data Superheroes are out in the community helping to educate others by sharing knowledge, tips, and best practices, both online and in person.

How does the Snowflake Corporation choose Snowflake Data Superheroes?

 

They mention that they look for the following key attributes:

 

  • You must overall be a Snowflake expert.
  • They look for Snowflake experts who create any type of content around the Snowflake Data Cloud (this could be any type of content from videos and podcasts to blogs and other written Snowflake publications.
  • They look for you to be an active member of the Data Hero community which is just the overall online community at snowflake.com.
  • They also want people who support other community members and provide feedback on the Snowflake product.
  • They want overall energetic and positive people

 

Overall, I would agree many of the 48 data superheroes for 2022 definitely meet all of the criteria above. This past year, since the program was new I also think it came down to that only certain people applied. (I think next year it will be less exclusive since the number of Snowflake experts is really growing from my view.  Back in 2018, there honestly was a handful of us. I would say less than 100 worldwide. Now there are most likely 200+ true Snowflake Data Cloud Experts outside of Snowflake Employees. Even though now, the product overall has grown so much that it becomes difficult for any normal or even superhero human to be able to cover all parts of Snowflake as an expert. The only way that I’m doing it (or trying to) is to employ many automated ML flows and Aflows I call them to organize all Snowflake publicly available content into this one knowledge repository of ITS Snowflake Solutions. I would also say that it comes down to your overall known prescience within the Snowflake Community and finally your geography. For whatever reason, I think Snowflake DSHs chosen by Snowflake for 2022 missed some really really strong Snowflake experts within the United States.

Also, I just want to add that even within the 48 Snowflake Data Superheroes, there are a few that just stand out as producing an insane amount of free community content.  I’m going to name them later after I run some analysis but there are about 10-15 people that just pump out the content non-stop!

 

What benefits do you get when you become a Snowflake Data Superhero?

 

Snowflake Data Superhero Benefits:

 

In 2022, they also provided all of these benefits:

 

  • A ticket to the Snowflake Summit – I have to say this was an awesome perk of being part of the program and while I disagree sometimes with Snowflake Corp decisions that are not customer or partner-focused, this was Snowflake Corporation actually doing something awesome, and really the right thing considering that of these 48 superheroes, most of us have HEAVILY contributed to Snowflake’s success (no stock, no salary).  While employees and investors reap large financial gains from the Snowflake IPO, many of us basically helped the company grow significantly.
  • Snowflake Swag that is different (well, it was for a while, now others are buying the “kicks” or sneakers)
  • Early education on new Snowflake Features
  • Early access to new Snowflake Features (Private Preview)
  • Some limited opportunities to speak at events. (Let’s face it, the bulk of speaking opportunities these days goes in this order:  Snowflake Employees, Snowflake Customers (the bigger the brand [or maybe the spend] the bigger the speaking opportunity), Snowflake Partners who pay significant amounts of money to be involved in any live speaking event, and finally external Snowflake experts, advocates, etc.
  • VIP access to events (we had our own Data Superhero area within Snowflake Summit)
  • Actual Product Feedback sessions with the Snowflake Product Managers

 

The only action that I can think of that really has been promised and not done so far in 2022 is providing every DSH with a test Snowflake Account with a certain number of credits.  Also, I do not think many of the DSHs have received their Data Superhero card. This was one of those benefits provided to maybe 10 or more of the DSHs back in 2019 or so.  Basically, anyone who was chosen to speak at Snowflake Build I believe is where some of it started.  I’m not 100% sure.

 

The Snowflake Data Superhero Program (2023)

 

How do I apply to be a Snowflake Data Superhero?
Here you go:  [even though for me the links are not working]
https://community.snowflake.com/s/dataheroes

 

Snowflake’s Data Superhero Program Evolution

 

I will add some more content around this as I review how the 2023 program is going to work.  I will say I have been surprisingly pleased with the DSH Program overall this year in 2022.  It has provided those Snowflake Data Superheroes that are more involved with the program as a way to stand out within the Snowflake Community.

 

Snowflake’s Data Superhero Program Internal Team

 

I also want to give a shout-out to the main team at Snowflake who works tirelessly to make an amazing Snowflake Data Superhero program. These individuals and more have been wonderful to work with this year:

  • Howard Lio
  • Leith Darawsheh
  • Elsa Mayer

There are many others too, from the product managers we meet with to other Snowflake engineers.

 

Other Snowflake Data Superhero Questions:

 

Here was the full list from Feb 2021.

Who are the Snowflake Data Superheroes?

https://medium.com/snowflake/introducing-the-2022-data-superheroes-ec78319fd000

 

Summary

 

I kept getting all of these questions about, hey – what is a Snowflake Data Hero?  What is a Snowflake Data Superhero?  How do I become a Snowflake Data Superhero?  What are the criteria for becoming one?

This article is my attempt to answer all of your Snowflake Data Superhero-related questions in one place. Coming from an actual Snowflake Data Superhero, I’ve been one for 3+ years in a row now. Hit me up in the comments or directly if you have any other questions.

Shortest Snowflake Summit 2022 Recap

Introduction:

 

Today’s article provides a recap of the Snowflake Summit 2022, including the key feature announcements and innovations. We highlight the major takeaways from the event and the outline of Snowflake’s position as a full-stack business solution environment capable of creating business applications.

We also include a more in-depth discussion of Snowflake’s seven pillars of innovation, which include all data, all workloads, global, self-managed, programmable, marketplace, and governed.

 

Snowflake Summit 2022 Recap from a Snowflake Data Superhero:

 

If you were unable to attend the Snowflake Summit, or missed any part of the Snowflake Summit Opening Keynote, here is a recap of the most important feature announcements.

Here are my top 20 announcements, mostly in chronological order of when they were announced. It was overwhelming to keep up with the number of announcements this week!

 

Cost Governance:

 

1. The concept of New Resource Groups has been announced. It allows you to combine all kinds of Snowflake data objects to monitor their resource usage. This is a huge improvement since Resource Monitors were previously quite primitive.

2. The concept of Budgets that you can track against. Resource Groups and Budgets coming into Private Preview in the next few weeks.

3. More Usage Metrics are being made available as well for SnowPros like us to use or Monitoring tools. This is important since many enterprise businesses were looking for this.

 

Replication Improvements on SnowGrid:

 

4. Account Level Object Replication: Snowflake previously allowed only data replication and not other account-type objects. However, now all objects that are not just data can supposedly be replicated as well.

5. Pipeline Replication and Pipeline Failover: Now, stages and pipes can be replicated. According to Kleinerman, this feature will be available soon in Preview.

 

Data Management and Governance Improvements:

 

6. The combination of tags and policies. You can now do  —  Private Preview now and will go into public preview very soon.

 

Expanding External Table Support and Native Iceberg Tables:

 

7. We will soon have support for external tables in Apache Iceberg. Keep in mind, however, that external tables are read-only and have certain limitations. Take a look at what Snowflake did in #9 below.

8. Snowflake is broadening its abilities to manage on-premises data by partnering with storage vendors Dell Technologies and Pure Storage. The integration is anticipated to be available in a private preview in the coming weeks.

9. We are excited to announce that Snowflake now fully supports Iceberg tables, which means these tables can now support replication, time travel, and other standard table features. This enhancement will greatly improve the ease of use within a Data Lake conceptual deployment. For any further inquiries or assistance, our expert in this area is Polita Paulus.

 

Improved Streaming Data Pipeline Support:

 

10. New Streaming Data Pipelines. The main innovation is the capability to create a concept of materialized tables. Now you can ingest streaming data as row sets. Expert in this area: Tyler Akidau

  • Funny—I presented on Snowflake’s Kafka connector at Snowflake Summit 2019. Now it feels like ancient history.

 

Application Development Disruption with Streamlit and Native Apps:

 

11. Low code data application development via Streamlit: The combination of this and the Native Application Framework allows Snowflake to disrupt the entire Application Development environment. I would watch closely for how this evolves. It’s still very early but this is super interesting.

12. Native Application Framework: I’ve been working with this tool for about three months and I find it to be a real game-changer. It empowers data professionals like us to create Data Apps, share them on a marketplace, and even monetize them. This technology is a significant step forward for Snowflake and its new branding.

 

Expanded SnowPark and Python Support:

 

13. Python Support in the Snowflake Data Cloud. More importantly, this is a major move to make it much easier for all “data constituents” to be able to work seamlessly within Snowflake for all workloads including Machine Learning. Snowflake has been making efforts to simplify the process of running data scientist workloads within its platform. This is an ongoing endeavor that aims to provide a more seamless experience.

14. Snowflake Python Worksheets. This statement is related to the previous announcement. It enables data scientists, who are used to Jupyter notebooks, to more easily work in a fully integrated environment within Snowflake.

 

New Workloads. Cybersecurity and OLTP! boom!

 

15. CYBERSECURITY. This was announced a while back, but it is being emphasized again to ensure completeness.

16. UNISTOREOLTP type support based on Snowflake’s Hybrid Table features. This was one of the biggest announcements by far. Snowflake now is entering a much larger part of data and application workloads by extending its capabilities beyond olap [big data. online analytical processing] into OLTP space which still is dominated by Oracle, SQL Server, mysql, postgresql, etc. This is a significant step that positions Snowflake as a comprehensive, integrated data cloud solution for all data and workloads.

 

Additional Improvements:

 

17. Snowflake Overall Data Cloud Performance Improvements. This is great, but with all the other “more transformative” announcements, I’ll group this together. The performance improvements include enhancements to AWS capabilities, as well as increased power per credit through internal optimizations.

18. Large Memory Instances. They did this to handle more data science workloads, demonstrating Snowflake’s ongoing commitment to meeting customers’ changing needs.

19. Data Marketplace Improvements. The Marketplace is one of my favorite things about Snowflake. They mostly announced incremental changes.

 

Quick “Top 3” Takeaways for me from Snowflake Summit 2022:

 

  1. Snowflake is positioning itself now way beyond a cloud database or data warehouse. It now is defining itself as a full-stack business solution environment capable of creating business applications.
  2. Snowflake is emphasizing it is not just data but that it can handle “all workloads” – Machine Learning, Traditional Data Workloads, Data Warehouse, Data Lake, and Data Applications and it now has a Native App and Streamlit Development toolset.
  3. Snowflake is expanding wherever it needs to be in order to be a full data anywhere anytime data cloud. The push into better streams of data pipelines from Kafka, etc., and the new on-prem connectors allow Snowflake to take over more and more customer data cloud needs.

 

Snowflake at a very high level wants to:

 

  1. Disrupt Data Analytics
  2. Disrupt Data Collaboration
  3. Disrupt Data Application Development

 

Want more recap beyond just the features?

 

Here is a more in-depth take on the Keynote 7 Pillars that were mentioned:

Snowflake-related Growth Stats Summary:

  • Employee Growth:

2019:  938 Employees

2022 at Summit:  3992 Employees

  • Customer Growth:

2019:  948 Customers

2022 at Summit:  5944 Customers

  • Total Revenue Growth:

2019:  96M

2022 at Summit:  1.2B

 

Snowflake’s 7 Pillars of Innovations:

 

Let’s go through the 7 pillars of snowflake innovations:

  1. All Workloads – Snowflake is heavily focusing on creating an integrated platform that can handle all types of data and workloads, including ML/AI workloads through SnowPark. Their original architecture’s separation of computing and storage is still a key factor in the platform’s power. This all-inclusive approach to workloads is a defining characteristic of Snowflake’s current direction.
  2. Global – Snowflake, which is based on SnowGrid, is a fully global data cloud platform. Currently, Snowflake is deployed in over 30 cloud regions across the three main cloud providers. Snowflake aims to provide a unified global experience with full replication and failover to multiple regions, thanks to its unique architecture of SnowGrid.
  3. Self-managed – At Snowflake, we are committed to ensuring that our platform remains user-friendly and straightforward to use. This is our priority and we continue to focus on it.
  4. Programmable – Snowflake can now be programmed using not only SQL, Javascript, Java, and Scala, but also Python and its preferred libraries. This is where Streamlit comes in.
  5. Marketplace – Snowflake emphasizes its continued focus on building more and more functionality on the Snowflake Marketplace (rebranded now since it will contain both native apps as well as data shares). Snowflake continues to make the integrated marketplace as easy as possible to share data and data applications.
  6. Governed – Snowflake stated that they have a continuous heavy focus on data security and governance.
  7. All Data – Snowflake emphasizes that it can handle not only structured and semi-structured data, but also unstructured data of any scale.

 

Conclusion:

 

We hope you found this article useful!

Today’s article recapped Snowflake Summit 2022, highlighting feature announcements and innovations. Snowflake is a full-stack business solution environment with seven pillars of innovation: all data, all workloads, global, self-managed, programmable, marketplace, and governed. We covered various topics such as cost governance, data management, external table support, and cybersecurity.

If you want more news regarding Snowflake and how to optimize your Snowflake accounts, be sure to check out our blog.

Snowflake Data Masking

Introduction:

 

Today’s article discusses Snowflake Data Cloud’s implementation of dynamic data masking, which is a column-level security feature used to mask data at query runtime. We provide a step-by-step guide on how to create and apply a data masking policy for email addresses in a stored procedure. The article also highlights the benefits of using dynamic data masking policies to secure and obfuscate PII data for different roles without access while displaying the data to roles that need access to it.

Last week, the United States Centers for Disease Control and Prevention (CDC) issued new policies regarding COVID-19 masks. We will focus on how to implement Snowflake Data Cloud’s “Data Masking”. Let’s get started!

 

What is Data Masking?

 

Data Masking is just like it sounds… the hiding or masking of data. This is a practical method to add extra data masking for column-level security. Data Masking overall is a simple concept. It has caught on in our new age of GDPR, PII. What is Snowflake’s Version of Data Masking? Snowflake’s implementation of this is… Dynamic Data Masking.

Dynamic Data Masking is column-level security that uses masking policies to mask data at your query run time. Snowflake’s version of data masking, has several features including Masking policies that are at the schema level. Data Masking currently works to mask data at either the table or view object. The masking policies are applied at query runtime. The masking policies are applied to every location where the column is displayed. Depending on all the variables of your role, your role hierarchy, your masking policy conditions, and SQL execution content then you will see fully masked data, partially masked data, or just plain text!

Now that you know what Snowflake Data Cloud Dynamic Data Masking is then…. how do you use it? Data Masking within Snowflake is enabled with Data Definition Language (DDL). Here is the basic syntax constructs you use for the masking policy object. It is your typical object CREATE, ALTER, DROP, SHOW, DESCRIBE. This is a common feature for most Snowflake objects, and one of the reasons why I prefer Snowflake. Most of the time, it’s reliable, easy to use, and consistent.

So, let’s have some fun and create a data masking policy for email addresses in a simple example. There are 3 main parts for creating and applying a dynamic data mask on Snowflake to a column. Here we go:

 

PART 1 – Enable and Grant Masking Policy

 

To enable masking policy on Snowflake, follow these steps:

  1. Grant create masking policy on schema to a role. For example: GRANT CREATE MASKING POLICY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE "DATA_MASKING_ADMIN_ROLE";
  2. Use the account admin role to grant apply masking policy on account to the role. For example: GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE "DATA_MASKING_ADMIN_ROLE";

Replace “DEMO_MASKING_DB.DEMO” with the actual schema name and “DATA_MASKING_ADMIN_ROLE” with the actual role name.

Remember to grant the necessary privileges to the roles that will use the masking policy.

 

PART 2 – Create a Masking Policy

To create a masking policy in Snowflake, follow these steps:

  1. Use a role that has the necessary privileges to create a masking policy.
  2. Use the schema where the table or view that needs the masking policy is located.
  3. Use the CREATE MASKING POLICY statement to create the policy. For example:
CREATE OR REPLACE MASKING POLICY MASK_FOR_EMAIL AS (VAL STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('HR_ROLE') THEN VAL
ELSE '*********'
END;

Replace MASK_FOR_EMAIL with the name of your masking policy. In this example, the policy masks the email column with asterisks for all roles except for the HR_ROLE.

Remember to grant the necessary privileges to the roles that will use the masking policy.

 

PART 3 – Apply the Masking Policy to a Column in a View or Table

 

To apply the masking policy to a column in a view or table in Snowflake:

  1. Use a role that has the necessary privileges to modify the table or view.
  2. Use the schema where the table or view that needs the masking policy is located.
  3. Use the ALTER TABLE or ALTER VIEW statement to modify the column and apply the masking policy. For example:
ALTER TABLE IF EXISTS EMPLOYEE MODIFY COLUMN EMAIL SET MASKING POLICY MASK_FOR_EMAIL;

Replace EMPLOYEE with the name of your table and EMAIL with the name of the column that needs the masking policy. Replace MASK_FOR_EMAIL with the name of your masking policy.

Remember to grant the necessary privileges to the roles that will use the masking policy.

(just creating a masking policy is not enough. Kind of like wearing a covid mask under your mouth and nose.  Even though you have a mask, it’s not applied really so it’s not working)

 

 

 

We will show you how to do all of this in detail below.

 

Dynamic Data Masking Example

Let’s say we want to create a data mask for email addresses in our row using a stored procedure.

If you have not been using our Snowflake Solutions Demo Database Training Example then let’s create a database, schema, and table to use.


/* SETUP DEMO DATABASE AND TABLE FOR DATA MASKING DEMO and PROOF OF CONCEPT */
USE ROLE SYSADMIN;  /*use this role or equivalent */
CREATE OR REPLACE DATABASE DEMO_MASKING_DB;
CREATE SCHEMA DEMO;
CREATE OR REPLACE TABLE EMPLOYEE(ID INT, FULLNAME VARCHAR,HOME_ADDRESS VARCHAR,EMAIL VARCHAR);
INSERT INTO EMPLOYEE VALUES(1,'Frank Bell','1000 Snowflake Lane North Pole, Alaska', 'fbell@snowflake.com');
INSERT INTO EMPLOYEE VALUES(2,'Frank S','1000 Snowflake Lane North Pole, Alaska', 'franks@snowflake.com');
INSERT INTO EMPLOYEE VALUES(3,'Craig Stevens','1000 Snowflake Lane North Pole, Alaska', 'craig@snowflake.com');
CREATE WAREHOUSE IF NOT EXISTS MASK_WH WITH WAREHOUSE_SIZE = XSMALL, INITIALLY_SUSPENDED = TRUE, auto_suspend = 60;


/* PART 0 – create and grant roles for DATA MASKING DEMO – REPLACE FREDDY WITH YOUR USERNAME– there is more to do when you use custom roles with no privileges */USE ROLE SECURITYADMIN;CREATE ROLE IF NOT EXISTS EMPLOYEE_ROLE;CREATE ROLE IF NOT EXISTS MANAGER_ROLE;CREATE ROLE IF NOT EXISTS HR_ROLE;CREATE ROLE IF NOT EXISTS DATA_MASKING_ADMIN_ROLE;GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE EMPLOYEE_ROLE;GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE HR_ROLE;GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE HR_ROLE;GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE HR_ROLE;GRANT USAGE,MODIFY ON DATABASE DEMO_MASKING_DB TO ROLE “DATA_MASKING_ADMIN_ROLE”;GRANT USAGE,MODIFY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE HR_ROLE;GRANT ROLE EMPLOYEE_ROLE TO USER FREDDY;GRANT ROLE MANAGER_ROLE TO USER FREDDY;GRANT ROLE HR_ROLE TO USER FREDDY;GRANT ROLE DATA_MASKING_ADMIN_ROLE TO USER FREDDY;



/* PART 1 – enable masking policy ON ACCOUNT AND GRANT ACCESS TO ROLE */GRANT CREATE MASKING POLICY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;USE ROLE ACCOUNTADMIN;GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE “DATA_MASKING_ADMIN_ROLE”;



/* PART 2 – CREATE MASKING POLICY /USE ROLE DATA_MASKING_ADMIN_ROLE; USE SCHEMA DEMO_MASKING_DB.DEMO;CREATE OR REPLACE MASKING POLICY MASK_FOR_EMAIL AS (VAL STRING) RETURNS STRING ->CASEWHEN CURRENT_ROLE() IN (‘HR_ROLE’) THEN VALELSE ‘********’END;


/* PART 3 - APPLY MASKING POLICY TO EMAIL COLUMN IN EMP:LOYEE TABLE */ALTER TABLE IF EXISTS EMPLOYEE MODIFY COLUMN EMAIL SET MASKING POLICY MASK_FOR_EMAIL;



**AWESOME – NOW YOU NOW HAVE CREATED AND APPLIED YOUR DATA MASK! Let’s Test it out.



/* TEST YOUR DATA MASK !!! --> TEST by QUERYING TABLE WITH DIFFERENT ROLES AND SEE RESULTS */
/* Notice the EMAIL is MASKED with ******* */
USE ROLE EMPLOYEE_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;
/* Notice the EMAIL is NOT MASKED */
USE ROLE HR_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;

ADDITIONAL DETAILS:

  • **MASKS are really custom data definition language (DDL) objects in Snowflake. *YOU can always get their DDL by using the Snowflake standard GET_DDL function or using DESCRIBE./ EXAMPLES for reviewing the MASKING POLICY // when using SECURITYADMIN or other roles without USAGE you must use the full DATABASE.SCHEMA.POLICY PATH */

USE ROLE SECURITYADMIN;DESCRIBE MASKING POLICY DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL;

USE ROLE ACCOUNTADMIN; /* when using SELECT that means the ROLE MUST HAVE USAGE enabled which the SECURITYADMIN role does not have by default */

SELECT GET_DDL(‘POLICY’,’DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL’);

 

Conclusion:

 

Dynamic Data Masking Policies are a great way to secure and obfuscate your PII data to different roles without access where necessary while at the same time displaying the PII data to the roles that need access to it. We hope this tutorial has helped you understand Dynamic Data Masking on Snowflake. For further information on Snowflake, check out our blog for more tips and tricks.