Snowflake Data Clean Rooms

Snowflake Data Clean Rooms.  In this article I will explain what a Snowflake Data Clean Room really is on the Snowflake Data Cloud.  Once you have a good grasp of what it is then we will cover Data Clean Room example use cases.  We helped Snowflake pioneer this new offering a couple years ago with our client VideoAmp which we brought over to the Snowflake Data Cloud.  Our original article back in July 2020 – Shows how to analyze PII and PHI Data using the earlier Data Clean Room concepts.  Fast forward 2 years and now Snowflake has dramatically improved the initial version and scope that we put together.

What is a Data Clean Room?

Data clean rooms on Snowflake from a technical view are currently a set of data related technologies (Data Shares, Row Access Policies, Secure User Defined Functions) that work best on Snowflake to enable double blind joins of data.  This allows Snowflake to power Data Clean Rooms because it has the underlying Data Sharing technology which is based partially on Micro-Partitions. that provide features like Data Sharing and Data Cloning.  From a business standpoint, once the complexity of operating a data clean room gets easier this can provide just HUGE value to businesses by sharing data without sharing the PII part of the data.

Some of the original concepts of data clean rooms (DCRs) were around data exchanges/areas that the huge internet behemoths like Facebook and Google could share aggregated data.  The concept was to share aggregated (non-PII discernible) data with their advertisers.  So if you were an advertiser you could put in your first-party data and then see if it matches on some non-PII aggregated data. ). My view though expands the concepts of Data Clean Rooms way beyond just Media/Advertising.  There are many other areas that can achieve huge value in being able to perform “controlled” and “governed” double blind joins of data sets.

Other ways to describe a data clean room for better or worse is it is really a concept where companies and their partners can share data at an aggregated double blind join level.  (On Snowflake, its already extremely easy to share data through secure views and tables with their ground-breaking Data Share technology.  One of all-time favorite features.).  You can share double blind join previously agreed identifiers.

What are Snowflake Data Clean Room Use Cases?

Media/Advertising:

  • Solving our “end of cookies” problem at some level.  Data Clean Rooms on Snowflake allow Advertisers to take their first-party data and combine it with  their publisher(s)’ viewership/exposure data for measurement of their marketing dollar.
  • Co-branding/co-marketing Promotions.  You can do customer segment overlap analysis to see where your partners customer bases have similar customer segments. and audiences.
  • Similarly, you can work with partners to do joint loyalty offerings and/or upsells where customer “interests” overlap.

Healthcare and Life Sciences:

  • There are some extremely valuable use cases where we can securely share patient data and patient outcomes across government, healthcare, and life scienes to hopefully make some huge leaps forward in healthcare and life..

Those are just a few.  There are many others.

Are you interested in how you can use a Snowflake Data Clean Room for your business?  Contact Us Today.

Want more info from others on Data Clean Rooms:

*Check out Patrick’s article here:  https://www.linkedin.com/pulse/snowflake-data-clean-room-patrick-cuba/
*Also, my friend and one of the top data clean room experts – Rachel shares some Q&A here on DCRs:
https://www.snowflake.com/blog/data-clean-room-qa/
*Lastly, there is a great video here from a Snowflake Solution Architect who I met but never accepts my linkedin invitation. ha ha.

Have fun!

Also, here is an interview I provided on my view of the opportunities around Data Clean Rooms.

 

 

A Deep Dive into Data Sharing

What is Data Sharing? 

Let’s begin with data, data can derive from software that is used by enterprises within their business. For example, how many people are viewing a website or what kind of people are most interested in a certain brand. On a lower level, data sharing is simply when data resources are shared with many users or applications and at the same time assuring that there is data fidelity to all of those participating.  

Now how is this relevant today? Currently, data sources are continuous which in turn means that there has to be data volumes for all the data sources. The main focus, with data sharing, has become how to move these increasing volumes of data and how to ensure that the data is accurate and secure. The cloud comes into play as it is expanding what data sharing is capable of. Now that there is the modern cloud, data sharing can allow people to share live data within their business and outside of it, get rid of data silos, create access to specific data sets, and more. However, this would require a platform that can put data sharing into motion and ensure that it works to its potential and this is where Snowflake comes into the picture. 

Snowflake and Data Sharing

Snowflake allows for data collaboration while at the same time lowering the costs. It gives organizations the ability to securely share data and access live data. Not only is it secured and governed access to shared data but you can also publish data sets. As you can see the possibilities seem endless, but that’s only a brief preview of the capabilities of data sharing within Snowflake so let’s take a deeper look at the many parts that play a role in data sharing in Snowflake and how they come together in data sharing. 

What are Data Providers and Consumers?

A data provider is an account within Snowflake that makes shares that are accessible to other accounts within Snowflake. Data providers share a database with another (or more than one) Snowflake account and for each database that is shared, Snowflake provides support with grants that can give access control to objects within the database. There are also no limitations on how many shares can be created or accounts that can be added to a share. 

A data consumer is an account that creates a database from a share that is made accessible by another data provider. When you add a shared database to your account you are able to access and query the objects within it. There are no limitations on how many shares you can consume from the data providers, but you can only make one database for each share. 

What is a Share? 

In Snowflake, shares are objects that sum up all the information that is needed in order to share a database. Within shares, there are the permissions that give access to the databases and schema that contain the object to share, that give access to specific objects within the database, and the consumer accounts that the database and objects are shared with. 

Then when a database is created from a share, the shared objects will be available to any users within the consumer account. These shares are customizable, secure, and fully controlled from the provider account. This means that objects that are added to a share will also be available to consumers which creates real-time access to the data and the access to a share or objects within it can be rescinded. 

How does Secure Data Sharing Function in Snowflake? 

In secure data sharing, the data itself is not actually copied or moved between the accounts as one may think. Sharing is done through Snowflake’s layer and metadata store. This means that shared data will not take up storage within a consumer account which means it would not add to the monthly data storage costs. The charges that would be made would be for the compute resources that are used to query the shared data. 

Going back to what was previously mentioned, because the data itself is not actually copied or exchanged it makes secure data sharing an easy and fast setup for providers and it also makes shared data quickly available to consumers. But let’s take a closer look at how data sharing works for both the provider and the consumer: 

Provider: Will create a share of a database within their account. Then they will grant access to objects within the database. From here they can share data from many databases (if the databases are from the same account). Lastly, one (or more) accounts will be added to the share and this can also include your accounts if you have many within Snowflake. 

Consumer: Will have a read-only database from the share. Access to the database is customizable when using the same access control that is provided for objects.

 

 

The way that Snowflake is structured, allows providers to share data with many consumers (even those in their own organization) and consumers can access the shared data from many providers. 

What Information is Shared with Providers?

Those that are providers in Snowflake are able to view a couple of things about consumers who have access to their data. 

Providers can see the consumers: Snowflake account name and Snowflake organization name. They can also see the statistical data on the data consumption, this includes the day of the consumption and the number of queries a consumer account creates on a provider’s share. 

Lastly, providers can see any information that a consumer gives (when a data request is submitted) such as the consumer’s business email and company name.

Can I share with Third Parties? 

Data sharing can only occur between Snowflake accounts. However, as a provider within Snowflake, you might want to share data with a consumer outside of Snowflake and there is a way to do this. 

In order to share data with outside consumers, Snowflake has created reader accounts. These accounts allow data to be shared without forcing a consumer to become a Snowflake customer. The reader accounts will belong to the provider account that made it. While the provider account uses shares in order to share the databases with reader accounts, the reader account can only take in data from the provider account that created it. 

The users that are in a reader account can also query data that has been shared with it, however, it can’t perform DML tasks that could be done in a full account.

Now that we have done an overview and introduction of data sharing and how it works within Snowflake – let’s take a look at some other features that come with Snowflake’s data sharing. 

Products that use Secure Data Sharing in Snowflake

Snowflake provides other products that you can use for data sharing in order to connect with providers of data with the consumers and these products include: direct share, snowflake data marketplace, and data exchange.

Direct Share

Direct Share is one of the easiest ways to share data that allows account-to-account data sharing while using Snowflake’s Secure Data Sharing. As the provider (account on Snowflake) you are able to share data with other companies so that your data is viewable in their Snowflake account without having to move your data or copy your data.  

Snowflake Data Marketplace

All accounts within Snowflake can use the Snowflake Data Marketplace as long as these accounts are on non-VPS regions that are on supported cloud platforms. Snowflake’s Data Marketplace uses Snowflake’s Securing Data Sharing in order to connect the providers with consumers (just as mentioned in the direct share product). 

You can also find and have access to third-party data and have the datasets in your Snowflake account in order to query without transformation and also join it with your data. The Data Marketplace gives you one location from where to get your data which makes it easier in cases where you are using different sellers for data sourcing. 

Lastly, you can be a provider (account) and publish data within the Data Marketplace which is great in the data monetization aspect and also as a different way to market. 

Data Exchange 

Data Exchange allows you to collaborate, securely, around data between groups that you invite which helps providers to publish data that will be seen by consumers. You can also share data with your entire business so think of your customers, partners, or even just within your own unit, and more! It also gives you the ability to control who is a part of your data, who can publish, consume, or even just access it. Specifically, you can invite others and decide what they are allowed to do: provide data or consume data. Data Exchange is supported for any accounts in Snowflake that are hosted on non-VPS regions and on all supported cloud platforms.

These three products within Snowflake that use secure data sharing prove useful to both provider and consumer (and more) accounts within Snowflake. But now that we have looked at how data sharing functions and what other features use data sharing in Snowflake, let’s take a look at how you actually use the data that was shared to you or your data that is shared to others and more. 

Working with Shared Data

After having a foundation of what direct share, Snowflake Marketplace, and data exchange consist of and how they function, we can look into more concepts and tools that are accessible within them. 

On Snowflake, when you have an ACCOUNTADMIN role, you are able to use the Shared Data page that is in the Snowflake new web interface in order to complete most assignments for managing and creating shares. Also as we continue, keep in mind that inbound is referring to data that is shared with you and outbound is referring to the data that has been shared by your account/from you.

Data Shared with You

Direct share, data exchange, or the Snowflake Marketplace can all be used by provider accounts  to share inbound shares with your account. Inbound shares can be used to view shares from providers such as who provided a share and how sharing the data was performed, and you can create a database from a share.

Within the Snowflake web interface there is a “Share With Me” tab that shows you the inbound shared data for:

  • Direct shares that are shared with you and these shares are placed into two groups which include 1. Direct shares that are ready to get and 2. Direct shares that were imported into a database and can be queried. 

  • Listings for data exchange that you can access. The data is shown under the name of the initial data exchange and if you have more than one data exchange then each data exchange will be shown within separate sections. 

  • Listings for the Snowflake Marketplace data that have been moved into a database and can be queried, but it does not show shares that are ready to get. However, you can find the data listing in the Marketplace menu.

Data You Shared 

Outbound shares are made within your account in order to share data with consumers. You are able to share data through direct share, data exchange, and the Snowflake Marketplace (as with inbound shares as previously mentioned). 

With outbound shares you can: 

  • See the shares you created or have access to. The database for the share, consumer accounts that can access the share, the day when the share was made, and objects that are shared are all information that is provided. 

  • Create and edit a share and data listing for both 

  • For individual consumer accounts, you can remove their access to the share.

Back to the web interface, the “Shared by My Account” tab shows the outbound shares that are from Snowflake Marketplace, data exchange, and direct share.

When looking at shares, there are icons next to each that demonstrate their sharing mechanisms such as direct share, data exchange, or Snowflake Marketplace. 

Lastly, you are able to have these filters when viewing your shared data: 

  • Type, which is seen as the “Ally Types” drop-down and it can be used to see direct shares compared to listings

  • Consumer, which is seen as the “Shared With” drop-down and it can be used to choose a certain consumer or data exchange (in which the data has been shared).

Data that is Shared

When sharing data, there are many ways you can do this:

1. Use direct share to directly share data with consumers 

2. In the Snowflake Marketplace, post a listing 

3. In data exchange, post a listing 

Furthermore, when you are in the web interface and you want to share data, you will use the “Share Data” drop-down and choose from the list that provides all the platforms where you can share data.

Requesting Data 

Within the web interface, the inbound and outbound requests can be seen in the “Requests” tab. However, this tab does not show the data requests from the Snowflake Data Marketplace. 

Let’s also take a step back and look at what exactly inbound and outbound requests are? 

Inbound requests come from consumers who are requesting to have access to your data. You are also able to organize these requests by their status and then review them. Outbound requests come from you when you submit requests for data listings from other providers. Similar to inbound requests, you can sort the requests by status. Keep in mind that requests you make can be rejected, but you also have the ability to resubmit your request.

Managing Exchanges 

With certain roles such as the Data Exchange Admin role or if you have Provider Profile Level Privileges you have the ability to create and organize the provider profiles within the “Manage Exchanges” tab. However, if your organization does not have a data exchange, then you will not see the “Manage Exchanges” tab. 

But back to the provider profile, if you have this, you are able to do the following tasks within  a data exchange: 

  • Create, update, and delete a profile

  • Update contact email

  • Manage profile editors

Now that we have gotten an overview of data sharing, you should be able to understand all the parts that make up data sharing and the various functions it contains!

Exploring Snowflake’s Search Optimization Service

Snowflake initially made a name for itself as the easiest data warehouse to use back in 2014. Since then it has transformed itself and its core technology into a full Snowflake Data Cloud.  While the Snowflake Data Cloud Account at first comes with many amazing features by default, there are many areas where you can optimize Snowflake for your specific needs and use cases.  As Snowflake has grown over the years, it has added a ton of functionality including paid services such as SnowPipe, Materialized Views, Auto Clustering, Search Optimization Service, and others.  


Today, let’s cover their Search Optimization Service.  This service can help the performance of point lookup for certain queries but remember it is available on the Enterprise Edition or higher for Snowflake (so Standard Edition users, you are out of luck if you wanted to use this service – you will need to upgrade your Account Edition.) This service is best for business users who rely on quick access to data to make critical business decisions. Alternatively it can be useful for data scientists who want to continuously explore specific subsets of data. Essentially it is a maintenance service that runs in the background of Snowflake and creates search access paths. These paths make it easier to load and populate data quickly, as well as update stale data. 


To turn on such a feature, you must first ensure you are using an account that has access to add it to a table. Having access means you have the following privileges: ownership & add search optimization. Once that requirement is met, its as simple as typing in the following into your console:

ALTER TABLE [IF EXISTS] <table_name> ADD SEARCH OPTIMIZATION;


To ensure it is turned on, show your tables and check to see that SEARCH_OPTIMIZATION says ON. A few notes to add on is that you WILL see an increase in credit consumption while the service runs and starts to build the search access paths. You can get an estimate of the cost for specific tables before committing by running the following command: 


SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS(‘<table_name>’)


Being strategic with the tables you introduce to the search optimization service will help greatly with reducing those costs. The service fits best for tables that aren’t queried by columns and tables that aren’t clustered.


If you add the service and decide to move it later on, you can easily do so with the correct privileges by running the following command:


ALTER TABLE [IF EXISTS] <table_name> DROP SEARCH OPTIMIZATION;


 This is just one solution to make your life easier and queries faster, however, there are many more out there that are more cost-friendly and do not require you to look thoroughly through your tables. One of the prime examples is Snoptimizer™, our service that scans for all the Snowflake anti-patterns and optimizes your account to help you run cost-effectively. It checks your resource monitors, auto suspend settings, cloud service consumption, and warehouse compute among other things to fix your account and ensure you are fully optimized. If you are interested in getting a trial, you can sign up and explore more here


Too Busy for the Snowflake Summit? We Feel You.

This Snowflake Summit was a roller coaster of emotions, but more often than not, we were thrilled with all the new announcements. With over 61+ sessions, we got to see some of Snowflake’s amazing new features, tons of use cases, and first hand looks on how to use their new tools with step-by-step labs. Most of us are too busy to watch two-days worth of webinars, but that’s where we come in – providing you with your weekly dose of Snowflake Solutions! We decided to help out by highlighting the most important announcements, as well as the sessions we thought were really worth the watch!

This time around Snowflake announced that they have five main areas of innovation: data programmability, global data governance, platform optimization, connected industries, and powered by Snowflake. While magical upgrades and new tools mean more flexibility for users, the reality is that most of these new features are still in private preview, so we (the public) won’t see them in action for some time. Regardless, we’ll still go through the top areas of innovation:

 

Platform optimization

Perhaps one of the most important improvements made this year is the improved storage economics. With reduced storage costs as a result of improved data compression, many will start to see savings on storage for new data. Snowflake has also developed new usage dashboards which will allow users to better track and understand their usage and costs across the platform. Cost optimization on Snowflake has thus far been a tricky subject, and while it seems Snowflake is making progress in that direction, there aren’t enough guardrails to prevent warehouse sizes (and bills) from skyrocketing. If you want to learn about the 1000 ways your company can accidentally lose money on Snowflake (and ways to prevent it), join us to learn more about Cost Optimization here!

 

Global Data Governance

Next up on the list are the six new data governance capabilities now introduced to the Snowflake platform. We’ll deep dive into the coolest three!

  1. Classification: automatically detects personally identifiable information.

    1. Why is this cool? We can apply specific security controls to protect their data!

  2. Row access policies: dynamically restrict the rows of data in the query based on the username, role, or other custom attributes.

    1. Why is this cool? We no longer need multiple secure views and can eliminate the need for maintaining data silos. That’s a win in our book.

  3. Access History: A new view that shows used and unused tables to produce reports.

    1. Why is this cool? You can see what’s actually bringing value and optimize storage costs based on what is frequently accessed or completely abandoned data. Who doesn’t love to save money?

 

Connected Industries

Following we have two upcoming features that we thought were worth mentioning since they will definitely be game changers! These two features are Discover & Transact, and Try Before You Buy, both of which will ease collaboration and data procurement between connected industries. While they are pretty self explanatory, it’s been a long week, so let’s go over them in quick detail.

  1. Discover and Transact: Directly within the Snowflake Data Marketplace, a consumer can now discover data and purchase with a usage-based pricing model.

    1. Why is this cool? Self-service! Duh! This will definitely reduce the cost of selling and delivering data to clients.

  2. Try Before You Buy: Now consumers can access sample data to make sure they’re getting all they need before signing that check.

    1. Why is this cool? Who doesn’t like a free sample?

 

Data programmability

Probably the most important updates are under the data programmability umbrella, so if you’re still with me, hang on a little longer, this is about to get interesting!

There are some innovations that are ready to be used now in public preview, so let’s check them out:

  1. SQL API: This new API enables customers to automate administrative tasks without having to manage infrastructure, there’s no need to maintain an external API management hub!

  2. Schema Detection: Now supports Parquet, Orc, Arvo, and hopefully more file formats in the future.

The good stuff that’s coming soon!

  1. Serverless Tasks: Snowflake will determine and schedule the right amount of computer resources needed for your tasks.

  2. Snowpark and Java UDFs: Snowpark is going to be the Snowflake developer’s new playground, allowing developers to bring their preferred languages directly into the platform. Java UDFS will also enable data engineers and developers to bring their own custom code to Snowflake, enabling better performance on both sides.

  3.  Unstructured Data Support: Soon, we will be able to treat unstructured data the same as structured data, with the ability to store, govern, process, and share.

  4. Machine Learning with Amazon SageMaker: A tool that will  automatically build and insert the best machine learning models into Snowflake!

 

Of course, the Snowflake Conference held various webinars on each of these innovations, so if you’d like to learn more, head over to those respective recordings. Hot topics this time around were definitely data governance and ML, so here are our top videos worth watching!

Conclusion: Again, while we were slightly disappointed to see that most of Snowflake’s new features were still in private preview, it makes us all the more excited for what’s to come! As always, IT Strategists will continue to guide you with these upcoming tools, so stay tuned for more Snowflake Solutions!

Snowflake Data Masking

Last week, the United States CDC issued new COVID-19 mask policies.  I will leave that for many others to discuss, but for the COOL Data People reading this we will focus on how easy it is to implement Snowflake Data Cloud “Data Masking”.   Ready? – Let’s “Data Mask” it UP!  

 

What is Data Masking? Data Masking is just like it sounds… the hiding or masking of data.  It is a convenient way to add additional masking of data for column level security.  Data Masking overall is a simple concept.  It has really caught on in our new age of GDPR, PII.   What is Snowflake’s Version of Data Masking? Snowflake’s implementation of this is….  Dynamic Data Masking. Dynamic Data Masking is column-level security which uses masking policies to mask data at your query run time. Pretty cool, eh?   Snowflake’s version of data masking, has several features including: *Masking policies that are at the schema-level. *Data Masking currently works to mask data at either the table or view object. *The masking policies are applied at query runtime.   *The masking policies are applied to EVERY location where the column is displayed. *Depending on all the variables of your role, your role hierarchy, your masking policy conditions, and sql execution content then you will see fully masked data, partially masked data, or just PLAIN TEXT!   Now that you know what Snowflake Data Cloud Dynamic Data Masking is then …. how do you use it?   Data Masking within Snowflake is enabled with Data Definition Language (DDL). Here are the basic syntax constructs you use for the MASKING POLICY object. It is your typical object CREATE, ALTER, DROP, SHOW, DESCRIBE  (this is pretty standard for most Snowflake objects and one of the reasons I like snowflake … most of the time is the consistency and the simplicity and ease of use).  

So, let’s have some fun and create a data masking policy for email addresses in a simple example.   There are 3 main PARTS for creating and applying a dynamic data mask on Snowflake to a column.  Here we go:

PART 1 – Enable and Grant Masking Policy
PART 2 – Create a Masking Policy
PART 3 – Apply the Masking Policy to a Column in a View or Table
(just creating a masking policy is not enough.  kind of like wearing a covid mask under your mouth and nose.  Even though you have a mask, its not applied really so its not working)

We will show you how to do all of this in detail below.

Dynamic Data Masking Example

Let’s say we want to create a data mask for email addresses in our  a row using a stored procedure.

If you have not been using our Snowflake Solutions Demo Database Training Example then let’s create a database, schema, and table to use.


/* SETUP DEMO DATABASE AND TABLE FOR DATA MASKING DEMO and PROOF OF CONCEPT */
USE ROLE SYSADMIN; /*use this role or equivalent */
CREATE OR REPLACE DATABASE DEMO_MASKING_DB;
CREATE SCHEMA DEMO;
CREATE OR REPLACE TABLE EMPLOYEE(ID INT, FULLNAME VARCHAR,HOME_ADDRESS VARCHAR,EMAIL VARCHAR);
INSERT INTO EMPLOYEE VALUES(1,'Frank Bell','1000 Snowflake Lane North Pole, Alaska', 'fbell@snowflake.com');
INSERT INTO EMPLOYEE VALUES(2,'Frank S','1000 Snowflake Lane North Pole, Alaska', 'franks@snowflake.com');
INSERT INTO EMPLOYEE VALUES(3,'Craig Stevens','1000 Snowflake Lane North Pole, Alaska', 'craig@snowflake.com');
CREATE WAREHOUSE IF NOT EXISTS MASK_WH WITH WAREHOUSE_SIZE = XSMALL, INITIALLY_SUSPENDED = TRUE, auto_suspend = 60;

 

 

/* PART 0 – create and grant roles for DATA MASKING DEMO – REPLACE FREDDY WITH YOUR USERNAME
– there is more to do when you use custom roles with no privileges */
USE ROLE SECURITYADMIN;
CREATE ROLE IF NOT EXISTS EMPLOYEE_ROLE;
CREATE ROLE IF NOT EXISTS MANAGER_ROLE;
CREATE ROLE IF NOT EXISTS HR_ROLE;
CREATE ROLE IF NOT EXISTS DATA_MASKING_ADMIN_ROLE;
GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE EMPLOYEE_ROLE;
GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE EMPLOYEE_ROLE;
GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE EMPLOYEE_ROLE;
GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE HR_ROLE;
GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE HR_ROLE;
GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE HR_ROLE;
GRANT USAGE,MODIFY ON DATABASE DEMO_MASKING_DB TO ROLE “DATA_MASKING_ADMIN_ROLE”;
GRANT USAGE,MODIFY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;
GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE EMPLOYEE_ROLE;
GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE HR_ROLE;
GRANT ROLE EMPLOYEE_ROLE TO USER FREDDY;
GRANT ROLE MANAGER_ROLE TO USER FREDDY;
GRANT ROLE HR_ROLE TO USER FREDDY;
GRANT ROLE DATA_MASKING_ADMIN_ROLE TO USER FREDDY;

 

/* PART 1 – enable masking policy ON ACCOUNT AND GRANT ACCESS TO ROLE */
GRANT CREATE MASKING POLICY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;
USE ROLE ACCOUNTADMIN;
GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE “DATA_MASKING_ADMIN_ROLE”;

 

/* PART 2 – CREATE MASKING POLICY */
USE ROLE DATA_MASKING_ADMIN_ROLE;
USE SCHEMA DEMO_MASKING_DB.DEMO;
CREATE OR REPLACE MASKING POLICY MASK_FOR_EMAIL AS (VAL STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN (‘HR_ROLE’) THEN VAL
ELSE ‘*********’
END;


/* PART 3 - APPLY MASKING POLICY TO EMAIL COLUMN IN EMP:LOYEE TABLE */
ALTER TABLE IF EXISTS EMPLOYEE MODIFY COLUMN EMAIL SET MASKING POLICY MASK_FOR_EMAIL;


***AWESOME - NOW YOU NOW HAVE CREATED AND APPLIED YOUR DATA MASK! Let's Test it out.



/* TEST YOUR DATA MASK !!! --> TEST by QUERYING TABLE WITH DIFFERENT ROLES AND SEE RESULTS */
/* Notice the EMAIL is MASKED with ******* */
USE ROLE EMPLOYEE_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;
/* Notice the EMAIL is NOT MASKED */
USE ROLE HR_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;

ADDITIONAL DETAILS:

*MASKS are really custom data definition language (DDL) objects in Snowflake.
YOU can always get their DDL by using the Snowflake standard GET_DDL function or using DESCRIBE.
/* EXAMPLES for reviewing the MASKING POLICY */
/* when using SECURITYADMIN or other roles without USAGE you must use the full DATABASE.SCHEMA.POLICY PATH */

USE ROLE SECURITYADMIN;
DESCRIBE MASKING POLICY DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL;

USE ROLE ACCOUNTADMIN; /* when using SELECT that means the ROLE MUST HAVE USAGE enabled which SECURITYADMIN role does not have by default */

SELECT GET_DDL(‘POLICY’,’DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL’);

Conclusion

Dynamic Data Masking Policies are a great way to secure and obfuscate your PII data to different roles without access where necessary while at the same time displaying the PII data to the roles that need access to it.  I hope this tutorial has helped you understand Dynamic Data Masking on Snowflake. Thanks for checking out this article.  For further information on Dynamic Data Masking Training then check out our new Free Snowflake Training Series.  

New Data Shares Added in January 2021 on the Snowflake Data Marketplace

Data Shares Removed from the Snowflake Data Marketplace

New Data Shares Available in November 2020 on the Snowflake Data Marketplace

New Data Shares Available in January 2021 on the Snowflake Data Marketplace

New Data Shares Available in December 2020 on the Snowflake Data Marketplace

Join The Stats Party!!!

Registration Details

Please fill in your details below to get accurate JUST THE STATS communications.

Snowflake Data Marketplace Introduction

Introduction

Long gone are the days where consumers have to copy data, use APIs, or wait days, weeks and sometimes even months to gain access to datasets. With Snowflake Data Marketplace, analysts around the world are getting the information they need to make important decisions for their businesses in a blink of an eye and at the palm of their hands.

So what is it and how does it work?

The Snowflake Data Marketplace is essentially a home to a variety of live, ready-to-query data. It utilizes Snowflake Secure Data Sharing to connect providers of data with consumers, as of now providing access to 229 datasets. As a consumer, you can discover and access a variety of third-party data and have those datasets available directly in your Snowflake account to query. There is no need for transformation and joining it with your own data takes only a few minutes. If you need to use several different vendors for data sourcing, the Data Marketplace gives you one single location from where to get the data.

Why is this so amazing?

Companies can finally securely provide and consume live, governed data in real time without having to copy and move data. In the past, access to such information could take days, weeks, months, or even years. With the Data Marketplace, gaining access only takes a couple of minutes. Already over 2000 businesses have requested access to essential data sets available free of charge in our marketplace. This is a gold mine for anyone who desires data-driven decision-making access to live and ready-to-query data, and the best part is that it is globally available, across clouds.

There are many benefits for providers and consumers alike. There are three main points, however, that allow companies to unlock their true potential when using the Data Marketplace.

Source Data Faster and More Easily

  • As we said above, using Snowflake Data Marketplace as a consumer allows users to avoid the risk and hassle of having to copy and move stale data. Instead, securely access live and governed shared data sets, and receive automatic updates in real time.

Monetize Your Own Data

  • As a provider, you can create new revenue streams by joining Snowflake Data Marketplace to market your own governed data assets to potentially thousands of Snowflake data consumers.

Reduce Analytics Costs

  • Using this service, both consumers and providers can virtually eliminate the costs and effort associated with the traditional ETL processes of data ingestion, data pipelines and transformation thanks to direct, secure, and governed access from your Snowflake account to live and ready-to-query shared data.

For more information, watch the video below or visit https://www.snowflake.com/data-marketplace/

Creating You First Database

To complete this tutorial, you’ll need to have a snowflake account and the URL to login to the web user interface. Start by navigating to your URL and login.

Creating Databases with the User Interface

Login to your Snowflake environment and select the Databases tab in the top left of your screen. It should look similar to the image below except you won’t have the same databases. Snowflake provides some sample data for you.

Let’s create a new database. Click the Create button and fill out the information in the pop up window.

When naming the database, there are restrictions — no spaces and the name cannot start the name with a number. You can read the full set of restrictions in Snowflake’s documentation.

Select Finish and you’ll see your new database appear in the table. Click on your database and you’ll see any tables, views or schemas that exist. Since the database has just been created, none of these objects exists yet.

Creating Schemas

A snowflake schema is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database and can have its own security configuration.

From inside our selected database, select the Schemas tab. Snowflake will create a public and information schema by default. The PUBLIC schema can be used to create any other objects. The INFORMATION_SCHEMA contains all the metadata for the database.

Click the create button and provide and name and comment. The Managed Access option determines if the security of the objects within the schema is managed by the schema owner or owner of the object.

Creating Databases and Schemas with Code

All operations done with the UI can also be done with SQL code on the worksheets tab. To see what code corresponds to the operations we are doing, click on the Show SQL button.

Using code makes our job easier once we have a grasp of how Snowflake works. This can cut down on time and be automated.

To execute any SQL code in Snowflake, select Worksheets from the main navigation bar. This area lists all databases on the left side for reference and provides a place for you to enter your code. On the right, we can see our current role, database, warehouse and schema.

Let’s enter the code required to replicate the process we did before with the UI. When you click the Run button, only the command where your cursor lies will execute. To execute multiple lines, highlight them and click run. If you have already created your database, you’ll need to run DROP DATABASE DEMO_DB for this to work.

Semi Structured JSON Data

One of Snowflake’s unique features is its native support for semi-structured data. Snowflake supports semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. JSON is the most widely used and industry standard due to its data format and ease of use.

JSON data can be loaded directly into the table columns with type VARIANT, a universal type that can be used to store values of any type. Data can be queried using SQL SELECT statements that reference JSON elements by their paths.

Let’s take a look at our JSON data. Here our some JSON data properties:

  1. Data in JSON is a name-value pair.
  2. Data is separated by a comma.
  3. Curly braces hold objects.
  4. Square brackets hold an array.
{
"ID": 1,
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,255,255",
"hex": "#000"
}
},{
"ID": 2,
"color": "white",
"category": "value",
"code": {
"rgb": "0,0,0",
"hex": "#FFF"
}
},{
"ID": 3,
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,0,0",
"hex": "#FF0"
}
}

Database Object

We have created a new database object to load and process semi-structured data as shown below. You can use the existing one if you have already created it earlier.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

Schema

Create a new schema under TEST_DATABASE object to have ease of access. This step is optional if you already have access to the existing schema. In such a case you can use the existing schema.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

TABLE

In order to create JSON data, we need an object to hold the data and it should be capable enough to hold the semi-structured data.

In snowflake, to process the semi-structured data, we have the following data types:

  • Variant
  • Array
  • Object

We’ll be using the variant object to load data into a Snowflake table.

CREATE TABLE IF NOT EXISTS COLORS
(
TEST_DATA VARIANT
);

Object CHRISTMAS_REC is created with one column TEST_DATA that holds the object of JSON data.

FILE FORMAT

To load the JSON object into a Snowflake table, file format is one of the mandatory objects in snowflake:

CREATE FILE FORMAT JSON_FILE_FORMAT
TYPE = 'JSON'
COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = TRUE
STRIP_NULL_VALUES = FALSE   IGNORE_UTF8_ERRORS = FALSE;

The above file format is specific to JSON. The STRIP_OUTER_ARRAY array option removes the outer set of square brackets [ ] when loading the data, separating the initial array into multiple lines. If we did not strip the outer array, our entire dataset would be loaded into a single row in the destination table.

STAGE

In order to copy the data to a Snowflake table, we need data files in the cloud environment. Snowflake provides two types of stages:

  • Snowflake Internal stage
  • External stages(AWS, Azure, GCP)

If you do not have any cloud platform, Snowflake provides space to store data into its cloud environment called – “Snowflake Internal stage”.

In this article, we have used a Snowflake internal stage and created a dedicated stage for semi-structured load.

CREATE STAGE IF NOT EXISTS JSON_STAGE FILE_FORMAT = JSON_FILE_FORMAT;

You can use below command to list files in stages:

LIST @JSON_STAGE;

PUT & COPY Command

PUT command fetches data from local storage to snowflake internal stages. You can run this command from the Snowflake CLI client. I’ll be using the Snowflake UI to do it under the database tab.

PUT file://<file_path>/sample.json @COLORS/ui1591821970011   COPY INTO "TEST_DATABASE"."TEST_SCHEMA"."COLORS" FROM @/ui1591821970011 FILE_FORMAT = '"TEST_DATABASE"."TEST_SCHEMA"."JSON_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT';

You can accomplish the same thing by using Snowflake UI under the database tab. Click on your database and then find your way to the table. Click on load data above it.

Check that the data was properly loaded (SELECT * from COLORS).

Querying Semi-Structured Data

Snowflake is extremely powerful when it comes to querying semi-structured data. The command works a lot like JavaScript, except we use : notation to retrieve the category for each row. By using :: notation, we define the end data type of the values being retrieved.

SELECT
test_data:ID::INTEGER as ID,
test_data:color::STRING as color,
test_data:category::STRING as category,
test_data:type::STRING as type,
test_data:code.rgb::STRING as code_rgb,
test_data:code.hex::STRING as code_hex
FROM
colors;

Conclusion

The process of loading data into a database can be a cumbersome task but with Snowflake, this can be done easily. Snowflake functionality makes it possible to process semi-structured data. Check out the docs to learn more about semi-structured data. Stay tuned for part two of this article on flattening arrays.

Snowflake Stored Procedures

Stored procedures can be thought of as a function. They enable users to create modular code and that include complex business logic by combining multiple SQL statements with procedural logic. They can be used for data migration and validation while handling exceptions.

Benefits of Stored Procedures include:

  • Procedural logic such as branching and looping which straight SQL does not support
  • Error handling
  • Dynamically creating SQL statements to execute
  • Executing code with the privileges of the stored procedures creator. This allows stored procedure owners to delegate power to perform operations to users who otherwise could not.

A Stored Procedure is created with a CREATE PROCEDURE command and is executed with a CALL command. The result is returned as a single value. A Stored Procedure uses JavaScript for logic and control and SQL is used to call the JavaScript API.

Stored Procedure Example

Let’s say we want to insert a row using a stored procedure.

First, let’s create a database and table to use.

USE DATABASE DEMO_DB;
CREATE OR REPLACE TABLE Employee(emp_id INT, emp_name varchar,emp_address varchar);
create sequence if not exists emp_id;

Now that we have that setup, let’s create our first stored procedure.

CREATE OR REPLACE PROCEDURE employee_insert(name varchar, address varchar)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
var command = "INSERT INTO Employee (emp_id, emp_name, emp_address) VALUES (emp_id.nextval, '"+NAME+"','"+ADDRESS+"')";
var cmd_dict = {sqlText: command};
var stmt = snowflake.createStatement(cmd_dict);
var rs = stmt.execute();
return 'success';
$$;

In the first section, we define the SPs name, parameters, return value, and language. Then between the $$, we write the actual code. Now that everything is setup, we can use the SP with the CALL command like this:

CALL employee_insert('Name','Location');

Let’s give this a try and see what happens!

Awesome! That worked!

Make sure to use a database that exists in your console if you’re replicating this.

However, a simple insert isn’t particularly useful for a Stored Procedure. We’d be better off just using pure SQL. Let’s look at a more realistic example that better demonstrate the power of Stored Procedures.

Taking it Further

Let’s imagine the following scenario. You receive an employee’s data in JSON. You want to store it into the table but you have to ensure the following first:

  • Employee name cannot be empty or NULL
  • Employee name must be unique
  • Employee address cannot be NULL

It is possible to do all of this using a single stored procedure. However, since we have two distinct tasks — validation and insert — it’s better practice to break them up into 2 separate procedures. We’ll then call the validation procedure from inside the insert procedure. We’ll be utilizing Snowflake’s variant type to return a json object. Let’s take a look at our validate procedure.

Validate Procedure

create or replace procedure employee_validate_json(INPUT VARCHAR)
RETURNS VARIANT NOT NULL
LANGUAGE JAVASCRIPT
AS
$$
var json_row = {};
var error_count = 0;
try {
    var employee_data = JSON.parse(INPUT);
    var employee_id = employee_data.employee_id;
    var employee_name = employee_data.employee_name;
    var employee_address = employee_data.employee_address;
    if (!employee_name) {
        json_row["employee_name"] = "employee name cannot be empty or null.";
        error_count = 1;
    } else {
        var command = "select count(*) from Employee where emp_name='"+ employee_name + "'";
        var stmt = snowflake.createStatement({sqlText: command});
        var res = stmt.execute();
        res.next();
        row_count = res.getColumnValue(1);
// check for duplicate
    if (row_count > 0) {
        json_row["employee_name"] = "employee name already exists in table.";
        error_count = 1;
    }
}
if (employee_address == null || employee_address == undefined) {
    json_row["employee_address"] = "employee address should not be NULL.";
    error_count = 1;
}
json_row["is_error"] = error_count;
} catch (err) {
    json_row["Exception"] = "Exception: " + err;
    json_row["is_error"] = 1;
}
return json_row;
$$;

The stored procedure will either return is_error = 0 if there is nothing wrong with our JSON or return is_error = 1 with the appropriate error message.

Successful run:

call EMPLOYEE_VALIDATE_JSON('{
"employee_name": "Lucas", "employee_address": "Los Angeles"
}'); 

Result:
{ "is_error": 0 }

Error run:

call EMPLOYEE_VALIDATE_JSON('{
"employee_name": "", "employee_address": "Los Angeles"
}');

Result:
{"employee_name": "employee name cannot be empty or null.", "is_error": 1 }

Now that we have our validate working. Let’s dive into our insert procedure!

Insert Procedure

Our insert procedure is going to call our validate procedure and check for any errors. If it finds any, it will return them. If not, it will attempt to insert the data into the table. Also returning a json upon completion.

create or replace procedure employee_insert_json(INPUT VARCHAR)
RETURNS VARIANT NOT NULL
LANGUAGE JAVASCRIPT
AS
$$
var json_row = {};
var message = {};
var detail = {};
var result = '';
var error_count = 0;
try {
    var employee_data = JSON.parse(INPUT);
    var employee_name = employee_data.employee_name;
    var employee_address = employee_data.employee_address;
    var command_validate = "call employee_validate_json('" + INPUT + "')";
    var cmd_dict_validate = {sqlText: command_validate};
    var stmt_validate = snowflake.createStatement(cmd_dict_validate);
    var rs_validate = stmt_validate.execute();
    rs_validate.next();
    var validate_result = rs_validate.getColumnValue(1);
if (validate_result.is_error > 0) {
    return validate_result;
} else {
    var command = "INSERT INTO employee(emp_id,emp_name,emp_address) VALUES(emp_id.nextval, '" + employee_name + "','" + employee_address + "')";
    var cmd_dict = {sqlText: command};
    var stmt = snowflake.createStatement(cmd_dict);
    var rs = stmt.execute();
    json_row["message"] = "successfully inserted employee";
}
} catch (err) {
    json_row["exception"] = err;
    error_count = 1;
}
json_row["is_error"] = error_count;
return json_row;
$$;

Let’s take a look at some sample runs.

call EMPLOYEE_INSERT_JSON('{
"employee_name": "Lucas", "employee_address": "Los Angeles"
}');

Result:
{ "is_error": 0, "message": "successfully inserted employee" }

Nice! That worked but let’s see what happens if we try to run the same command again.

call EMPLOYEE_INSERT_JSON('{
"employee_name": "Lucas", "employee_address": "Los Angeles"
}');

Result:
{"employee_name": "employee name already exists in table.", "is_error": 1 }

When we try it again our validate procedure finds that the employee name already exists!

Conclusion

Stored procedures are a great way to streamline your Snowflake tasks. They can also be used to grant higher level access to lower level users in a defined manner. I hope this tutorial has helped you create or transfer your stored procedures to Snowflake. Thanks for checking out this article.

Getting Started with Snowflake

Overview

Snowflake is a modern data platform. Unlike many others, Snowflake didn’t start as an on-premise data solution and then migrate to a web-based server. It was built in the cloud for the cloud. This means Snowflake can quickly handle large analytic workloads (columnar architecture and vectorized execution).

Snowflake separates its computation engine from storage. This allows it to have the additional advantage of adaptive optimization. This means that Snowflake can automatically scale your cloud usage up or down based on your current needs. In other words, Snowflake saves a little (uses less resources) on every data operation which saves you money long term.

If you’d like to checkout Snowflake’s capabilities yourself, you can sign up for a complimentary account with $400 worth of credit here.

Setup Requirements

  • Snowflake Account
    • Sign up for a free here or contact snowflake directly

You can connect to Snowflake using any of the following methods. I recommend starting with the browser if this is your first time. Then transitioning to either the CLI or a one of the libraries based on the back end stack you use.

  • Browser-based web interface
    • Minimum Version
    • Chrome: 47
    • Safari: 9
    • Firefox: 45
    • Opera: 36
    • Edge: 12
  • SnowSQL, Snowflake CLI
    • Redhat-compatible linux operating systems
    • macoOS (64-bit)
    • Windows (64-bit)
  • Client using JDBC or ODBC
    • Linux
    • MacOS
    • Windows
      • 64-bit for JDBC Driver
      • 32-bit or 64-bit for ODBC driver
  • Any 3rd party partner

Logging into Snowflake

Account Name

All access to Snowflake is through your account name. You’ll need it to sign in and it’s part of the URL for browser access.

https://account_name.snowflakecomputing.com

The full account name may include region and the cloud platform where your account is hosted. Ex. account-name.us-east-2.aws

Logging into the Web Interface

Go to the hostname provided by Snowflake for your account. The format should be:

https://account_name.snowflakecomputing.com

You should see the following screen.

Enter your credentials and Login.

Logging in Using SnowSQL

SnowSQL is the command line client for connecting to Snowflake and executing SQL queries and DDL and DML operations. To connect follow this quick guide.

Logging in Using Other Methods

In addition to the web interface and SnowSQL, Snowflake supports other methods for connecting.

  • 3rd-party clients services that support JDBC or ODBC
  • Developer applications that connect through drivers for Python, Node.js, Spark, etc.

These methods require additional installation and configuration. Check out this for details.

Web Interface

Snowflake web interface allows you to create/manage virtual warehouses, databases, and data objects. Use this interface to load data into tables, execute ad hoc queries, perform DML/DDL operations, and view past queries. You can also manage administrative tasks such as changing passwords and managing users. Check out Managing Your Snowflake Account for more information.

On the top of the UI you’ll see the following tabs.

Databases

This page shows the databases you have created or have the privileges to access. Tasks you perform on this page include:

  • Create, clone, or drop database
  • Transfer ownership of database

Click the name of a database to view and perform tasks on it:

Shares

This is a new page added to Snowflake. It allows you to consume data being shared with your organization and also provide data to others. Don’t worry about it for now.

Warehouses

Warehouses are Snowflake’s computational engines. You will define them by size (computational power) and spin them up to perform data analytics.

This page shows information about the virtual warehouses to create or can access. Task you can perform on this page include:

  • Create or drop a warehouse
  • Suspend or resume a warehouse
  • Configure a warehouse
  • Transfer ownership of a warehouse

Worksheet

Worksheet page is where you can write and run SQL queries and DDL/DML operations. The results can be viewed side by side as your operations complete. Tasks you can perform include:

  • Run ad hoc queries and other DDL/DML operations in a worksheet, or load SQL script files.
  • Open concurrent worksheets, each with its own separate session.
  • Save and reopen worksheets.
  • Log out of Snowflake or switch roles within a worksheet, as well as refresh your browser, without losing your work:
    • If you log out of Snowflake, any active queries stop running.
    • If you’re in the middle of running queries when you refresh, they will resume running when the refresh is completed.
  • Resize the current warehouse to increase or decrease the compute resources utilized for executing your queries and DML statements.
  • Export the result for a selected statement (if the result is still available).

For more information, checkout Using Worksheets for Queries.

HistoryHistory page in the Snowflake web interface

The History page allows you to view the details of all queries executed in the last 14 days. The page displays a historical listing of queries, including queries executed from SnowSQL or other SQL clients. You can perform the following tasks on this page:

  • Filter queries displayed on the page.
  • Scroll through the list of displayed queries. The list includes (up to) 100 queries. At the bottom of the list, if more queries are available, you can continue searching.
  • Abort a query that has not completed yet.
  • View the details for a query, including the result of the query. Query results are available for a 24-hour period. This limit is not adjustable.
  • Change the displayed columns, such as status, SQL text, ID, warehouse, and start and end time, by clicking any of the column headers.

For more information, checkout Using the History Page to Monitor Queries.

Help Menu and user Preferences

On the top right, there is a drop down menu from the help button. It supports the following actions:

  • View the Snowflake Documentation
  • Visit the Support Portal
  • Download the Snowflake clients by opening a dialog box where you can:
    • Download the Snowflake CLI client (SnowSQL) and ODBC driver.
    • View download info for the Snowflake JDBC driver, Python components, Node.js driver, and Snowflake Connector for Spark.
  • Show help panel with context-sensitive help for the current page.

To the right of the help button is the user preferences. You can then change your password or security role for the session (if you have multiple roles assigned to you). For more information about security roles and how they influence the objects you can see in the interface and the tasks you can perform, see Access Control in Snowflake.

You can also use this dropdown to:

  • Set your email address for notifications (if you are an account administrator).
  • Close your current session and exit the Snowflake web interface.

Thanks for checking out this overview of Snowflake! I hope this tutorial was helpful. Stay tuned for more snowflake articles 🙂

SnowSQL CLI Client

Introduction

SnowSQL is the command line client for Snowflake. It allows you to execute SQL queries and perform all DDL and DML operations. It’s an easy way to access snowflake right from your command line and has all the same capabilities as the Snowflake UI.

Step 1 – Download and install SnowSQL CLI

  • Login into Snowflake and click on help in the top right corner
  • Click on Downloads -> Snowflake Repository

This will lead you to a web index page.

  • Click on bootstrap -> 1.2 (or newest version) -> Pick your OS (Darwin is Mac) -> Download the latest version
  • Run the installer.

Step 2 – Running SnowSQL CLI

  • Check SnowSQL is installed properly

Run: snowsql -v

Output: Version: 1.2.5 (or latest)

Good! Now that snowsql has been installed, let’s set up our environment to work.

  • Login to your account

snowsql -a account_name -u username

Account name can be found in the first part of your url when logged into snowflake (everything before snowflakecoputing.com, for instance sample_username.sample_region.azure)

  • Setup the database context

// create a warehouse

CREATE WAREHOUSE yourname_WH AUTO_SUSPEND = 60 AUTO_RESUME=TRUE;

USE WAREHOUSE yourname_WH;

// select your desired database

USE DATABASE SNOWFLAKE_SAMPLE_DATA;

// select the data schema

USE SCHEMA TPCDS_SF100TCL;

(Note: Lukes-MacBook-Pro and lmunro are specific to my console. Yours will be different unless you somehow stole my laptop in which case please give it back.)

Awesome! Now we’re ready to perform whatever data analytics you desire.

However, it can be quite tedious to type in your account, username, password, warehouse, DB, and schema every time you login. You can edit the snowSQL config file to perform these automatically.

Step 3 – Edit Config File

  • Locate hidden snowsql folder
    • Linux/Mac OS: ~/.snowsql/
    • Windows: your-user-folder.snowsql
  • Open the file named config and add the following

[connections.configuration-name]

accountname = your_account_name

username = your_username

password = your_password

dbname = SNOWFLAKE_SAMPLE_DATA

warehousename = yourname_WH

schemaname = TPCDS_SF100TCL

  • Save and exit
  • Connect using the following command

snowsql -c configuration-name

Step 4 – Modify Display Prompt

SnowSQl prompt automatically displays the current user, warehouse, database and schema. The tokens can be seen in the image above. This prompt can be a bit lengthy but you can edit the prompt with the following command:

!set prompt_format=>>

To auto change the prompt format, add the following to the configuration file.

[options]

# auto_completion gives you possible existing options, very helpful!

auto_completion = True

prompt_format=>>

Conclusion

SnowSQL CLI is a quick way to plug into Snowflake directly from the terminal. It’s preferable to the UI if you already have a grasp of terminal operations and don’t require the UI to navigate around.

The config file in the SnowSQL folder is where you can set configuration and options for the CLI. You can preset login credentials and database settings by adding a [connections.***] block and specify options by adding to the [options] block. For more information check out the public documentation.

Q&A

What does the -c mean?

Whenever you run a program in the terminal you can specify arguments with a dash (-). The -c parameter tells the program snowsql to look in ~/.snowsql/config for a connection named lmunro_config. It then uses those credentials and other configurations to quickly log you in and set up your environment. Note: -c is an abbreviation. You can also use –connection.

Are there any other parameters that I should know about?

Yes! There are a bunch of parameters which can make your life easier. In fact, you can login and set up your environment all in one line, like this:

snowsql -a **.east-us-2.azure -u lmunro -d SNOWFLAKE_SAMPLE_DATA -s TPCDS_SF100TCL -w LUCASMUNRO_WH

Don’t worry if that’s a bit overwhelming. You can (and should) use the config file so you don’t need to type it all out. If you’re interested in using these parameters or want more information check out the docs.

Snowflake and Python

Introduction

Python has become one of the goto languages for data analytics. I’m constantly using jupyter notebooks to quickly clean, analyze, and visualize data. That’s why I was ecstatic to learn that my favorite data warehouse, Snowflake, has a simple python connector. In fact, it took me just 10 minutes to set up my environment and start running analytics on some COVID-19 data!

Allow me to walk you through it.

Overview

The Snowflake Connector for Python provides an interface to develop Python applications which connect to Snowflake. The connector supports all standard operations.

Prerequisites

If you don’t have python yet, install it here.

This tutorial requires a minimum of Python 2.7.9 or Python 3.5.0, any version above that is supported. To check what version of Python you have installed on your machine, open terminal and run the following command:

python –version

If your python version is out of date, run this command to update:

python -m pip install –upgrade pip

Install Python Connector for Snowflake

Snowflake’s Python Connector is part of the Python Package Index (PyPI) so we can install it with pip or conda.

pip install –upgrade snowflake-connector-python
#Or (if your using python 3)
pip3 install –upgrade snowflake-connector-python
#Or (if you prefer conda)
conda install -c conda-forge snowflake-connector-python

Connecting to Snowflake with Python

Now that the connector is installed, let’s connect to Snowflake. I’m using jupyter notebooks but you can use any Python IDE for this. To begin, let’s import the Snowflake package we just downloaded.

import snowflake.connector

Now that we’ve imported the library, we’ll need 2 key pieces of information to connect to snowflake.

  • Snowflake account and region
  • User login and password

The snowflake account and region can be found in the URL when you log into Snowflake website. For example:

http://demo-account.demo-region.snowflakecomputing.com

The format is https://ACCOUNT.ACCOUNT_REGION.snowflakecomputing.com. So our account would be demo-account.demo-region. Your user information is the same you use to login to snowflake. We can create some variables to store this information.

sfAccount = ‘demo-account.demo-region’

sfUser = ‘demo-user’

sfPassword = ‘demo-pass’

Now we have all the information needed to use the Python connector in our application. The following example attempts to establish a connection and print out the version of snowflake we have running. If the connection fails, an error message is printed out.

import snowflake.connector

sfAccount = ‘demo-account.demo-region’
sfUser = ‘demo-user’
sfPass = ‘demo-pass’
// Connection object holds the connection and session information with the database alive
conn = snowflake.connector.connect(
user = sfUser,
password = sfPass,
account = sfAccount
)
// creates a cursor object to for execute and fetch operations
cs = conn.cursor()

try:
cs.execute(“SELECT current_version()”)
one_row = cs.fetchone()
print(one_row[0])
finally:
cs.close()
conn.close()

There you go! We’ve just connected to our snowflake database with python and retrieved some information. This should serve as a starting point for you to build your application. To dive deeper into what snowflake operations you can do with python check out the official documentation.

Secure Data Sharing with Snowflake

Introduction

Big Data. Internet of Things. Social Media. Everyday millions of data points are generated and moved across the internet, from the viral video you send to your friend on TikTok to critical business data to make decisions. The acceleration of data use and utility is only getting faster. That’s why security and protecting your data is of the utmost importance.

The cloud is the solution. Snowflake’s platform offers instant access to live data in a secure format. This feature can not only streamline data sharing inside organizations but also how we share data to the outside. Many companies continue to use outdated data sharing technologies which are more costly and less secure.

How Does Secure Data Sharing Work?

Sharing involves two parties: the data provider and one or more consumers. Snowflake enables sharing of database tables, views and user-defined functions (UDFs) using a Secure Share object. A data consumer given a Secure Share has access to a read-only version of the database in their own account to operate on.

With Snowflake’s data sharing, no actual data is copied or transferred between accounts. All sharing is accomplished through the service layer and metadata store. Thanks to Snowflake’s architecture and separate compute from storage, data sharing is instant and prevents storage costs.

Sharing is done with a Secure Share object. Each Secure Share includes: privileges that grant access to the database, schema, consumer accounts and objects being shared (tables, views, UDFS). The share object represents a connection between the provider and consumer. New objects or data can be added to the object and will be available to the consumer in real time. Access to a share can be revoked at any time as well.

Two Types of Consumers

Participates in Secure Data Sharing are either data provider or consumer. A provider creates a Secure Share to export, and the consumer imports the Secure Share. However, there are two different types of consumers: reader accounts and full consumer. The difference affects who pays for the storage resources.

Full consumer accounts are existing Snowflake customers. Data can be shared directly to the existing account and the existing account pays for all computer resources incurred by querying the database.

Reader accounts are from consumers not on Snowflake’s platform. If the consumer is not a Snowflake customer, the provider can create Reader Accounts. For a Reader Account, all costs would be paid by the provider who shared the data. All costs can be tracked and invoiced back to the consumer.

Setting up a Secure Share with Reader Account

Let’s begin with an example using my personal database (BDU), which has a schema containing flight data (FAA). There are eight tables in the schema:

Click the Shares tab at the top of the user console, making sure that you toggle the view to Outbound since you are a provider creating a Secure Share that will go to an outside account. Then hit the Create button to open the menu to create a Secure Share. Please note that you need ACCOUNTADMIN privileges, or a custom role that has been granted these specific privileges, to create a Secure Share. SYSADMIN privileges will not be enough:

In the menu, insert the Secure Share Name (SHAREDEMO), and click the button to Select Tables & Secure Views to select the eight tables. Please note that you can select or deselect tables or views as needed. Next, hit the Create button to complete the process:

Now you see that Secure Share has been created and the data can be previewed. Please note that you need an available Virtual Warehouse that is usable by your current role in order to run this query. Next, click the button to Add Consumers:

As you can see, there are two Account Types: Reader and Full. Full is an existing user which will assume costs. Reader does not, the data provider pays for them. In this case, the intended consumer is not a Snowflake customer, so click the Create a Reader account link:

To create a Reader Account, insert the name (READERACCOUNT) and a comment (optional), along with credentials for the ACCOUNTADMIN user (READER1). Then click Create Account:

Going back to a worksheet and executing the command SHOW MANAGED ACCOUNTS will show all details needed:

Going back to the previous Add Consumers menu, you can now see that the new Reader Account has been added as a Consumer of the Secure Share. Clicking the blue link with the account locator (XB28199) will also take you the login for the Reader Account:

Enter the credentials for the ACCOUNTADMIN user:

Once you are in your Reader Account, you will need to do some setup. First, ensure that your role is ACCOUNTADMIN (it will default to SYSADMIN). Then click the Warehouses tab and create a warehouse to use (DEMO_WH). You will need compute resources to run queries from your provider.

Click the Shares account, making sure that you are toggled to Inbound. We see the SHAREDEMO has been shared to us by the ITSTRATEGISTS account we were using, which is the data provider in this case. Click Create Database From Secure Share:

In the menu, name your database accordingly and grant access to the desired roles. You can change the name of the database given by the provider. You can grant access to the database to multiple roles. Click Create Database:

Navigate to Worksheets. The shared database (SHARE_FAA) now has read-only tables you can query using the SYSADMIN role. As the consumer, you should be able to read from it using a reporting tool like Tableau or an ETL tool. If you’re a Full Consumer, you can query this data along with your existing data while not paying any storage costs.

Database tables are read-only. To change this, you can create another database and table by selecting from the shared tables. In this case, a table that has 148 million records took 27 seconds to move over using a medium-sized warehouse. Unfortunately, you cannot use Snowflake’s clone feature on a shared database.

Frequently Asked Questions

What is the difference between sharing data with existing Snowflake customers versus non-Snowflake customers?

Existing Snowflake customers bear the costs of any storage or compute incurred from querying or copying the shared database. For a non-Snowflake customer, the provider would create Reader Accounts and pay for all the costs incurred by those accounts. As the provider, you would be able to track account usage and bill the consumer if that was part of the set business agreement.

How fast would the data update for consumers?

Instant! The data is stored and shared on the cloud so the provider and consumer see the same data. The Secure Share is a metadata wrapper that points to the correct data that is still sitting with the provider. So any changes to the provider’s dataset would be reflected instantly from the consumer’s viewpoint.

Once the consumers can see the data in their account, what can they do with it?

They can query and run any analytics they desire such as Tableau or Sigma. If they wish to manipulate the data, they can copy that data into their own database, so they would have write access as.

Can I allow my users to see only selected tables or views?

Yes! Snowflake has a role-based security framework, so the Account Administrator can limit which roles are accessible to certain users. Roles are a collection of permissions granted on objects. Therefore, what a user can see depends on what permissions have been granted to the user’s role.

What if my consumers go overboard with how much they are using the Reader Account?

As the data provider for Reader Accounts you control usage. Set up Resource Monitors, which impose limits on the number of credits that virtual warehouses use within a specified interval or date range. When these limits are reached or are approaching, the Resource Monitor can trigger an alert and suspension of the warehouse.

THE SNOWFLAKE SUMMIT RECAP – 2019

THE SNOWFLAKE SUMMIT RECAP – 2019

The first snowflake summit finally happened on June 3rd to 6th and lived up to the expectation of many people who were interested in the summit. The four days summit had more than two thousand attendees, one hundred and twenty presentations across seven tracks, seven keynote presentations, more than thirty hands-on labs, more than thirty-five theatre sessions and more than thirty countries represented by the attendees.

A quick recap of the summit…

Day 1

The first day of the summit majorly involved attendees of the summit undertaking an essential snowflake training which ended with the trainees taking an exam. This was a smooth and exciting experience as people were placed in rooms where they had their background scripts and environments with snowflake representatives ready to help anyone out. The exam was made of two parts, the first part was made of multiple choices relating to the training done, and the second part was done upon passing the first part, which was practical. The practical involved creating a user, a database, a table that loaded from a Google spreadsheet and executing various transformations that would load in the final table.

Day 2

The significant aspects of the day involved making important announcements about new snowflake features. The features included snowflake being available on Google cloud, external tables, snowflake organizations, data replication, data exchange, and data pipeline. The significant announcements are explained below:

  •      Snowflake announced that it would be available on the Google platform for 2020. This would ensure that organizations using snowflake get seamless and secure data integration across various platforms, thus enabling them to choose the right vendor for their business. It will also be easy for customers to utilize Google’s ecosystem of applications. Customers also can use the Google cloud platform and manage applications across multiple clouds.
  •      Snowflake also introduced new data pipeline features that allow customers to query data directly from their data lake on Azure Blob Storage or AWS S3 which enables them to maintain the data lake as the single source of truth.
  •      Snowflake’s data exchange is currently available for viewing privately with the public viewing being set later in the year. The data exchange is free to join market place for enabling users to connect with data providers for seamlessly discovering assessing and generating insights from the user’s data.

Day 3

The keynotes on the third day started with Alison Levine, who is the author of “on edge,” giving an informative talk on leadership. The founders of snowflake Benoît Dageville, who is the current president of products, and Thierry Cruanes, who is the current CTO, also gave a talk on the reason for starting snowflake. They did this by referencing their vision of; “Simply load and query data”. The day ended with Kevin O’Brien of Kiva.org and Julie Dodd of Parkinson’s UKshowing how data could be used to make the world a better place.

Day 4

The last day of the summit saw Matthew Glickman, the Snowflake VP of Customer and Product Strategy, giving a closing keynote on some of its customer’s journey to be data-driven. Some of the customer’s representatives invited on stage included Brian Dumman, the Chief Data and Analytics Officer, McKesson, Yaniv Bar-Dayan, Cofounder and CEO, Vulcan Cyber, and Michal Klos, Senior Director of Engineering, Indigo/Localytics. By the end of the summit, it was clear that the future of data had arrived with snowflake having the capability of providing trusted data solutions to its customers.

The 2020 summit will be better

The 2020 summit will be held on June 1st to 4th at the Aria Hotel in Las Vegas, which is a bigger venue. Considering the success of the snowflake 2019 summit, the 2020 summit will be more significant and will have more activities. I honestly can’t wait for it.

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!