Dataops Role in the Automation of Automation

Dataops Role in the Automation of Automation

We provided in depth coverage of the concept of the Automation of Automation which we see accelerating Data Driven Business and Business Solutions to an entirely new level.  Besides Generative AI (GenAI) it really is the culmination of automations of software development over the last 20-30 years.  When I started this Data and Software game back in the nineties you had to develop pretty much everything.  Now I would say that 80-90% of software that we had to develop back then is now within libraries.  Most of which are open source.  

In addition to that coding has progressed and become incredibly easier by stable reliable software libraries and package managers, there has been the last 20 or so years of I hate to say it …. “Zapier” or “IFTTT” – If this then that solutions upon solutions.  There are hundreds of them now.  

I mean in the Snowflake ecosystem I have watched as far back as 2018 there were maybe 3-5 ELT type tools Stitch (Acquired by Talend), Fivetran (now Fivetran/HVR), and Matillion where really Stitch and Fivetran sort of pioneered the “Connect this to That … NO CODE solution”.  Now, there are at least 30 if not 50++ endless “Connect this to That” data solutions.  (actually I feel like there is 100+ but its a super fragmented market beyond Fivetran with Big Bucks Andresson Horowitz fueling the endless marketing on it.  We have even seen new upstarts like Omnata Data Superheroes come out with incredibly easy and low cost “Connect this to That Data Solutions”

Then to top off that, we had the Hightouches nad the Census AND many others come out with the “Data Activation” concepts which were actually I’m pretty sure built off of my original article around how to build a CDP on top of Snowflake.  When I wrote that it was really not thought to be possible and the tech wasn’t really there.  Now its totally possible and we see significant traction around “Don’t let your friends buy PURE CDPs” marketing and messaging.  (Which I typically agree with but hey – you have to evaluate what is the best business/technical solution for you.  What is even funnier though with this statement is that MANY of the CDP vendors I know have actually moved to having Snowflake as their backend.  

So with that fun start, let’s talk about Dataops Role in the Automation of Automation.  Quite a few years ago, Guy and Kent Graziano were telling me to check out this TrueDataOps stuff and I was like, wtf is this new term.  Like do we have to have 1 more new term.  I mean I still don’t think we need Data Mesh but that’s another story or “thought leadership argument”.

While I’m kind of slow to take on new concepts “sometimes” I now truly see how Dataops facilitates better Data Product Overall Solutions.  Without it, or without dataops.live at this point for our solutions, you can only go so far with your Automation of Automation of Quality and repeatable solutions 

Frank Slootman is retiring from Snowflake

Over the last few minutes, Snowflake communicated during their earnings call/report on 2024-02-28 at 2pm Pacific that our friend Frank Slootman would be retiring from Snowflake and that Sridhar Ramaswamy would be taking over as CEO.  The stock market immediately reacted either to that or the expectations around the earnings call by the stock going down by 20%.  Sridhar came from Snowflake's Neeva acquisition back in 2023.  We will be really interested in how he will lead the company going forward filling in for Big Frank Slootman's shoes.

We at Snowflake Solutions were deeply rooted in the Snowflake Community back in 2019 when Frank took over from Bob Muglia.  Overall, I think most people within the Snowflake Community including Snowflake Employees, Partners, and Customers would credit Frank for excellent overall leadership from 2019 until now taking Snowflake to the most successful Software IPO ever in September 2020.  Frank along with many of his team he brought over from his previous CEO jobs like Mike Scarpelli, John Sapone, and many others immediately went to work in 2019 working towards the Snowflake growith and positioning for a successful IPO.

We will add more later on our take of the Sloot Snowflake Era but we wish Frank good health and good luck in his retirement.  He was always good to us and we respect his decision on retirement.

The Ultimate Guide to Getting Started with Snowflake

Introduction:

Are you ready to revolutionize your data management and analytics? In this guide, we'll take you through a step-by-step process to help you get started with Snowflake, from watching live demos to participating in hands-on labs.

1. Watch a Live Demo: Unveiling the Power of Snowflake:

The first step in your Snowflake journey is to witness its capabilities in action. Head over to the Snowflake website and explore the "Resources" or "Learn" section. Keep an eye out for upcoming webinars or live demos conducted by Snowflake experts. These sessions provide a firsthand look at how Snowflake can transform your data management.

To get started, simply register for a live demo and mark your calendar. During the session, experts will showcase key features and functionalities, giving you a sneak peek into the world of Snowflake.

2. Try Snowflake for Free: Dive into the Cloud:

Ready to experience Snowflake hands-on? Navigate to the Snowflake website and find the "Free Trial" or "Sign Up for Free" button. The registration process is straightforward – provide basic information, set up a username, and create a secure password.

Once you're registered, log in to the Snowflake platform using your credentials. Congratulations! You now have access to a world-class data warehousing solution.

3. Explore the Interface: Navigating Snowflake with Ease:

With your account set up, take a moment to explore the Snowflake interface. Familiarize yourself with key components such as Worksheets, Object Browser, and Warehouses. Snowflake boasts a user-friendly design, and you can find detailed documentation on their website to guide you through each feature.

4. Load Sample Data: Hands-On Learning:

To truly grasp Snowflake's capabilities, consider loading sample data into the platform. Look for sample datasets provided by Snowflake within the documentation or the platform itself. This hands-on experience will give you a practical understanding of how data is managed and queried within Snowflake.

5. Participate in a Virtual Hands-On Lab: Guided Learning Experience:

Want a more guided approach to learning? Check if Snowflake offers virtual hands-on labs or workshops. These interactive sessions provide step-by-step guidance on key exercises, allowing you to deepen your understanding of Snowflake's functionalities.

Keep an eye on announcements on the Snowflake website, community forums, or other communication channels for information about upcoming virtual labs.

6. Join the Snowflake Community: Connect and Learn:

Snowflake's community is a treasure trove of knowledge and experience. Connect with other users and experts, ask questions, share your insights, and learn from real-world use cases. The Snowflake Community is an invaluable resource for expanding your Snowflake expertise.

7. Refer to Documentation and Tutorials: In-Depth Knowledge:

For in-depth understanding, explore Snowflake's comprehensive documentation and tutorials. The official documentation covers everything from specific features to best practices and troubleshooting tips. It's your go-to resource for mastering Snowflake at your own pace.

Introducing Snowflake Native Apps:

If you are looking to dive deep into the world of Snowflake Native Apps, we are pioneers and top experts in this field. We cover all you need to know about Native Apps in this section of our website.

Part 2: A Deep Dive into the Snowflake Native Apps Ecosystem

Introduction:

In Part 1 of our series, we introduced you to the world of Snowflake Native Apps, highlighting their transformative role in data analytics and management. Now, in Part 2, we'll take a closer look at Snowflake's Native App ecosystem, exploring the range of tools available and how they cater to different data roles and responsibilities. Let's dive in.

Exploring Snowflake Native Apps Portfolio:

These are a few of the Native apps currently accessible in the Snowflake Native Apps Marketplace. For a detailed overview of the available options, please visit this webpage.

  • Cost optimizer for Snowflake: Understanding your credit leakages by (NTT Data)
  • Test Automation for Snowflake: Automate the data quality monitoring by (NTT Data)
  • Dark Data Discovery: Identify and monetize your unused data by (NTT Data) 
  • Aero Health Check: Get instant cost & latency optimization recommendations by aero
  • Health Check for Snowflake: Summarized report for Snowflake’s consumption and performance by Flurry Insights
  • License Patrol: Machine learning model for software license utilization tracking by Elementum
  • Ops Center: Open, Free, Snowflake Warehouse Cost and Operations Management by Sundeck

Overview of Snowflake Native Apps

Snowflake's Native Apps are a family of specialized applications, each tailored to address specific data tasks and roles. Here are some key Native Apps that are essential for different data professionals:

1. Streamlit - Streamlit is a Python framework for building data apps. It is open source, easy to use, and provides a variety of features that make it a good choice for building Snowflake Native Apps. It is easy to use, flexible, performant, and integrates seamlessly with Snowflake. 

  • Ease of use: It provides a simple and intuitive API for building data apps.
  • Flexibility: It can be used to build a wide variety of data apps, from simple dashboards to complex machine learning applications.
  • Performance: They can handle large datasets and complex workloads with ease.
  • Integration with Snowflake: This makes it easy to build Snowflake Native Apps that can access and process Snowflake data.

2. Data Science: Snowflake's data science tools enable data scientists and AI practitioners to access and analyze data within Snowflake, making it easier to train machine learning models. With a secure and integrated environment, data scientists can streamline their workflows and collaborate effectively.

3. Data Sharing and Collaboration: Snowflake's native apps for data sharing and collaboration foster teamwork by simplifying data sharing both within and outside the organization. These apps ensure data accuracy and security while promoting efficient collaboration.

Data Science and AI Integration

1. Snowflake Data Science: This Native App allows data scientists to perform data science tasks directly within Snowflake. It supports popular machine learning libraries, making model development and deployment a seamless process. By leveraging Snowflake's data-sharing capabilities, data scientists can access, analyze, and train models using Snowflake data.

2. Snowflake's Data Marketplace: This data-sharing platform offers access to a vast array of external data sources, enabling data scientists to enrich their analyses and models with diverse datasets. The integration of these datasets into Snowflake's ecosystem accelerates the development of AI and machine learning models.

Data Sharing and Collaboration

Efficient data sharing and collaboration are pivotal in today's data-driven business landscape. Snowflake's Native Apps are equipped with features that facilitate these critical aspects:

1. Snowflake Data Exchange: Data Exchange is a marketplace for data providers and consumers. Data providers can publish datasets for others to access, while data consumers can easily integrate these datasets into their analytics processes. This platform encourages collaboration among organizations, opening the door to valuable data insights.

2. Snowflake's Secure Data Sharing: Secure Data Sharing enables organizations to securely share data with their partners, customers, and other stakeholders. With fine-grained access controls and robust security measures, data sharing becomes a streamlined and trustworthy process.

Introducing our own Native App: Snoptimizer

Unlock the power of Snowflake Native Apps with Snoptimizer, created by our visionary founder and 4x Snowflake Data Superhero, Frank Bell. With 1.6 years of dedicated experience in Native Apps, we're the pioneers you need to bring your ideas to life.

Sign up today for a personalized demo and let us transform your data world.

Don't miss out on the expertise that only Snoptimizer can offer. Visit our website to explore the endless possibilities. Your data deserves the best!

We are happy to help optimize your Snowflake account or help you build your own Snowflake Native App.

Conclusion:

Snowflake's Native Apps ecosystem is a testament to the platform's commitment to simplifying data management, analytics, and collaboration. The integrated development environment, specialized data science tools, and data-sharing capabilities cater to a wide range of data professionals, making Snowflake a versatile and powerful platform.

In Part 3 of our series, we will explore the future of Snowflake Native Apps, discussing emerging trends and predictions that are shaping the landscape of data analytics. Stay connected to discover how Snowflake is evolving to meet the data needs of today and tomorrow.


Part 3 will look ahead at the evolving landscape of data analytics and the trends shaping the future with Snowflake Native Apps.

Snowflake’s Plan to Acquire Ponder – Python in the Cloud

Introduction:

In the ever-evolving landscape of cloud computing, Snowflake has once again made headlines with a significant move. The data warehousing giant recently announced its acquisition of Ponder, a company specializing in Python capabilities in the cloud. This strategic move is poised to have a profound impact on how organizations leverage data analytics and Python for their business operations. This article will cover everything around Snowflake's plan to acquire Ponder - Python in the Cloud.

The Power of Python in the Cloud:

Python has firmly established itself as a leading programming language for data analysis, machine learning, and artificial intelligence. Its simplicity, versatility, and rich ecosystem of libraries have made it a go-to choice for data professionals. Snowflake, the data warehousing company known for its cloud-based approach to data management, has recognized the growing importance of Python in modern data analytics.

Snowflake's Plan to Acquire Ponder:

Ponder, a company with a strong focus on enhancing Python capabilities in the cloud, caught Snowflake's attention. The acquisition is part of Snowflake's broader strategy to empower its customers with more robust data analytics tools. By integrating Ponder's expertise and technologies into the Snowflake platform, users will have access to enhanced Python capabilities for their data-driven tasks.

If you care to read the official release from Snowflake on this acquisition, here's the link to see it.

Key Benefits of the Acquisition:

  1. Seamless Integration: Snowflake's acquisition of Ponder aims to seamlessly integrate Python capabilities into its data warehousing platform. This integration will make it easier for data professionals to work with Python within Snowflake's ecosystem. Hence, allowing for a smoother, more efficient workflow.
  2. Efficient Data Analysis: Python users will benefit from the cloud's scalability and elasticity. This enables them to analyze and process massive datasets without the limitations of on-premises solutions.
  3. Collaboration and Sharing: The integration of Ponder's technology will facilitate better collaboration among data teams. Python-based workflows can be shared, modified, and scaled with ease, leading to improved teamwork and productivity.
  4. Security and Governance: Snowflake's robust security and governance features will extend to Python workloads. This will ensure that sensitive data remains protected, even when processed using Python.

Implications for Data Professionals:

This acquisition presents exciting opportunities for data professionals. With enhanced Python capabilities available within Snowflake, data scientists, analysts, and engineers can expect to work more efficiently, analyze larger datasets, and extract deeper insights from their data. Furthermore, the cloud-based environment simplifies data management and maintenance, reducing the operational burden.

What we can help you with:

Here, at ITS Snowflake Solutions, we are experts at building data-driven businesses in the Snowflake cloud. Our latest product, Snoptimizer, aims to optimize your Snowflake account and help your business thrive.

Conclusion:

Snowflake's acquisition of Ponder is a clear sign of the growing importance of Python in the world of cloud-based data analytics. This move empowers data professionals to harness the full potential of Python within Snowflake's robust platform. The combination of these two technologies promises to accelerate data-driven decision-making, boost productivity, and further establish Snowflake as a leader in the data warehousing and analytics space.

As the integration progresses, organizations are encouraged to explore the enhanced Python capabilities offered by Snowflake to take their data analytics to new heights in the cloud.

Snowflake Summit Guide by ITS

Snowflake Summit 2023
Practical Guide

ESSENTIAL INFO

1. Get your badge as early as possible. avoid the crowds that will be there tuesday.

2. Get a copy of the conference map if you haven’t already (attached below)

3. If you are interested in the vendors, then make a short list of who you want to visit. With 179 vendors this year along with Snowflake’s own set of booths make it hard to cover all of them.

4. Reserve your sessions early.  Already by a week ago many of the LLM ones were already full. (wow, sounds like AWS re:invent – such a wonderful experience of too many people)

5. IF you have a RESERVED session you actually don’t want to miss then get there 10 minutes early because supposedly at 5 minutes before they start giving away reserved seats.

KEYNOTES

June 26, 20235:00 PM – 5:45 PM PT
Generative AI’s Impact on Data Innovation in the Enterprise 


June 27, 20239:00 AM – 11:15 AM PT
Snowflake Summit Opening Keynote 


June 28, 20239:00 AM – 10:30 AM PT
Builder Keynote

COME & MEET FRANK BELL

WHY ATTEND & WHAT TO DO?

In this world of information overflow, it’s important to take a step back and ask yourself what you really desire to learn from the Snowflake Summit. 

Here are some some guideline questions to help you prioritize your interests. 

1. Do you want to learn?  If so, how do you learn?  

2. Do you want to learn more about vendor offerings?

3. Do you want to get certified? 

4. Do you want to sell? 

5. Do you want to party? 

6. Do you want to network 

EXTRA PRO-TIPS

1. If you learn visually,  you can check in your own time the conferences being recorded. 

 

2. There is not enough time for you to see more than ~24 sessions out of 431 over 3-4 days.  So do yourself a favor, ask yourself what really matters to you personally and where do you want to focus.

 

3. If you are not a partier or networker than skip the parties.  Also, pro-tip – remember going all out Sunday, Monday or Tuesday night can make the next day(s) hard on yourself and your health! 

WANT TO PARTY?

Here is an “unofficial” list of parties as well as if you can still register as of Saturday 2023-06-24.  Snowflake Summit Parties – 2023 Unofficial List

We will try to update this as we gather more information. 


SNOWFLAKE: BEYOND AI

Specific Industry Sessions (snowflake has made a big move into multiple industry verticals.  check out the latest in these 7 specific verticals at summit) Industry Sessions at Snowflake Summit:

BE AT THE RIGHT TIME & PLACE

Last year the conference was completely within Caesar’s Forum but this year (2023) the sessions are distributed between BOTH Caesar’s Forum and Caesar’s Palace which is about .7 mile walk

For the #datageeks,  session distribution between Forum and Palace

DISCOVER OUR PARTNERS!

Dataops.live

 
Visit us at booth #2253 to see a demo and learn how to super charge your data engineering teams and realize 10x data engineering productivity: mitigate risk, accelerate data products, reduce costs.
And don’t miss our session June 27th at Noon PDT with our CTO & Co-Founder Guy Adams titled “Build Your Snowpark-Powered Data Products and Data Applications w DataOps.live”.

Sigma Computing


Awarded Snowflake Business Intelligence. Partner of the Year 2023. Whether you spreadsheet or SQL—teams explore, analyze, & decide with data in Sigma. Whether attending one of our customer sessions, visiting us in our booth, or joining on of our flagship evening events, we hope you had a chance to learn more about what it means to data confidently.

Hightouch

 

Hightouch is a data integration platform that helps businesses connect and sync their customer data across various tools and systems. Whether you attended one of our customer sessions, visited us in our booth, or joined one of our flagship evening events, we hope you had a chance to learn more about what it means to confidently handle data. 

Coalesce


Visit the Coalesce booth 1300 in Basecamp West for an incredible amount of fun, mind-blowing demos, and an abundance of goodies.
Experience new feature demos on four TVs, witness mesmerizing magic tricks (some not even related to Coalesce), and grab your exclusive Data Transformer giveaways, including the hottest t-shirt at Summit.

 

Fivetran


Looking for a chance to see Fivetran in action and learn more about our data integration platform? Look no further than Booth #1700 in Basecamp West at the upcoming event!
Stop by our booth to get a live demo and see firsthand how easy Fivetran makes data movement into Snowflake — regardless of your data source.

 

Snowflake


We are one of the first original Partners of Snowflake.
Our founder, Frank Bell, is 1 of 72 Snowflake Data Superheroes worldwide. 

 
 

Snowflake Plugin Available NOW on VSCode

Snowflake Plugin Available NOW on VSCode:

 

We have great news! Snowflake has released its own VSCode Plugin! It’s currently in Public Preview (PuPr) and you can download it from the Microsoft Visual Studio Extension marketplace. With the Snowflake Plugin, you will have access to some features such as:

 

  • Accounts and Sessions: the plugin allows you to connect to and easily switch between multiple Snowflake accounts. And (this is cool) you can share a single session between multiple open VSCode editor windows! Support for Single Sign On (SSO) is available.
  • Snowflake SQL Intellisense: autocomplete for object names, keywords, and built-in functions, with signature help for function calls. Links to documentation for keywords and built-in functions on hover.
  • Database Explorer: a treeview-based panel that lets you drill down into object definitions and details.
  • Query Execution: not just single statements, but multiple statement executions!
  • Query Results and History panel: View and sort query results and export results to CSV format. Review prior statement history and results, and copy/paste support on previous queries.

 

How to install the Snowflake plugin on VSCode:

 

  1. Launch VSCode and head over to the Extensions Marketplace tab

     2. Type in “Snowflake” and select the verified Snowflake extension (It should have the verification checkmark)

     3. Click on the Snowflake icon to log in. The extension will ask for your account’s URL however this part can be tricky. Instead of inputting the whole URL just add the part before .snowflakecomputing.com

 

For example, if your account URL is https://ja13154.east-us-2.azure.snowflakecomputing.com/, enter ja13154.east-us-2.azure in the Account Name/URL box.

 

                                 

 

    4. As a final step add your username and password and you are all set to go!

 

With these simple steps, you can now use the Snowflake Plugin on VSCode. If you want to learn about other new features on Snowflake, be sure to check out our blog for new updates.

Snowflake Data Clean Rooms

Introduction: What is a Data Clean Room?

 

In this article, I will explain what a Snowflake Data Clean Room is on the Snowflake Data Cloud.
Data clean rooms on Snowflake are a set of data-related technologies that facilitate double-blind joins of data. These technologies include Data Shares, Row Access Policies, and Secure User Defined Functions. The underlying Data Sharing technology is based on Micro-Partitions, which provide features like Data Sharing and Data Cloning.

Although the original concept of data clean rooms was developed for data exchanges in advertising, I believe the concept can be applied to many other areas where "controlled" and "governed" double-blind joins of data sets can create significant value. This approach enables companies and their partners to share data at an aggregated double-blind join level, without sharing personally identifiable information (PII).
On Snowflake, sharing data through secure views and tables using their Data Share technology is already straightforward. You can share double-blind join previously agreed upon identifiers.

 

Part 1: Data Clean Room Example Use Cases

We helped Snowflake pioneer this new offering a couple of years ago with our client VideoAmp which we brought over to the Snowflake Data Cloud. Our original article back in July 2020 shows how to analyze PII and PHI Data using the earlier Data Clean Room concepts. Fast forward 2 years and now Snowflake has dramatically improved the initial version and scope that we put together. These are just a few examples; there are many other potential use cases for Snowflake Data Clean Rooms.

 

Media/Advertising:

  • Addressing the challenge of the "end of cookies" in a meaningful way, Snowflake's Data Clean Rooms enable Advertisers to merge their first-party data and their publisher(s)’ viewership/exposure data, delivering more value for their marketing spend.
  • Collaborative Promotions. Conducting customer segment overlap analysis with a co-branding/co-marketing partner can reveal areas where customer segments and audiences are aligned.
  • Joint loyalty offerings and/or upsells can also be developed in partnership with aligned customer "interests".

 

Healthcare and Life Sciences:

  • There are some extremely valuable use cases where we can securely share patient data and patient outcomes across government, healthcare, and life sciences to hopefully make some huge leaps forward in healthcare and life.

 

Financial Services:

  • Combining data from multiple financial institutions to identify fraud or money laundering activities without sharing sensitive customer information.

 

Retail:

  • Combining customer data from different sources to create targeted marketing campaigns and promotions.

 

Government:

  • Sharing data across different government agencies to improve public services while protecting individual privacy.

 

Part 2: Looking for more information about Data Clean Rooms?

Here are some additional resources to help you learn more about Data Clean rooms and Data Collaboration.

 

Lastly, here’s an interview I provided on my view of the opportunities around Data Clean Rooms on Snowflake. I shared some insights gained from decades of experience working in data, including thoughts about the transformational impact that cloud-based data sharing, data collaboration, data marketplaces, and data clean rooms are having on companies and industries.

What’s Next in Data Collaboration & Why Data Clean Rooms Are Exciting: Insights From Frank Bell

 

Are you interested in how you can use a Snowflake Data Clean Room for your business? Contact Us Today.

A Deep Dive into Data Sharing

Introduction:

Big data refers to extremely large datasets that can be analyzed to identify patterns, trends, and associations. The analysis of big data provides insights into various fields, including business, science, and government. However, the challenge with big data is not just analyzing it, but also storing, managing, and sharing it. This is where technologies like Snowflake come into play, as they offer a secure platform for storing and sharing large amounts of data.

 

Part 1: What is Data Sharing?

Let’s begin with data, data can derive from software that is used by enterprises within their business. For example, how many people are viewing a website, or what kind of people are most interested in a certain brand? On a lower level, data sharing is simply when data resources are shared with many users or applications and at the same time assuring that there is data fidelity to all of those participating.

Now how is this relevant today? Currently, data sources are continuous which in turn means that there have to be data volumes for all the data sources. The main focus, of data sharing, has become how to move these increasing volumes of data and how to ensure that the data is accurate and secure. The cloud comes into play as it is expanding what data sharing is capable of. Now that there is the modern cloud, data sharing can allow people to share live data within their business and outside of it, get rid of data silos, create access to specific data sets, and more. However, this would require a platform that can put data sharing into motion and ensure that it works to its potential and this is where Snowflake comes into the picture.

 

Snowflake and Data Sharing

Snowflake allows for data collaboration while at the same time lowering costs. It gives organizations the ability to securely share data and access live data. Not only is it secured and governed access to shared data but you can also publish data sets. As you can see the possibilities seem endless, but that’s only a brief preview of the capabilities of data sharing within Snowflake so let’s take a deeper look at the many parts that play a role in data sharing in Snowflake and how they come together in data sharing.

 

Part 2: What are Data Providers and Consumers?

A data provider is an account in Snowflake that creates shares that can be accessed by other accounts in Snowflake. When a database is shared, Snowflake supports it through grants that allow access control to objects within the database. There are no restrictions on the number of shares that can be created or accounts that can be added to a share.

A data consumer is an account that creates a database from a Share that is made accessible by another data provider. When you add a shared database to your account, you can access and query the objects within it. There are no limitations on how many Shares you can consume from data providers, but you can only create one database for each Share.

 

What is a Share?

In Snowflake, Shares are objects that contain all the necessary information for sharing a database. Shares include permissions that provide access to the databases and schema containing the object to be shared, as well as access to specific objects within the database. Additionally, consumer accounts are shared with the database and objects.

When a database is created from a Share, the objects shared within it become available to any users within the consumer account. These Shares can be customized, are secure, and are fully controlled by the provider account. This allows objects added to a Share to be accessed in real-time by consumers, and the provider account can also rescind access to a Share or any of its objects.

 

Part 3: How does Secure Data Sharing Function work in Snowflake?

When securely sharing data, the data is not copied or transferred between accounts, as one might assume. Rather, sharing is accomplished through Snowflake’s layer and metadata store. As a result, shared data does not occupy storage space within a consumer account, and therefore does not contribute to monthly data storage costs. However, charges will be incurred for the compute resources required to query the shared data.

Going back to what was previously mentioned, because the data itself is not copied or exchanged it makes secure data sharing an easy and fast setup for providers and it also makes shared data quickly available to consumers. But let’s take a closer look at how data sharing works for both the provider and the consumer:

 

Provider:

We will create a share of a database within your account. You can then grant access to objects within the database. This will enable you to share data from multiple databases, as long as those databases are under the same account. Finally, you can add one or more accounts to the share, including any accounts that you may have within Snowflake.

 

Consumer:

We will set up a read-only database from Share. You can customize access to the database by using the same access control that is provided for objects.

The structure of Snowflake allows providers to share data with many consumers, even those within their organization. Consumers can access shared data from many providers.

 

What Information is shared with Providers?

Snowflake providers have access to certain information about consumers who access their data. This includes the consumer's Snowflake account and organization names. Providers can also view statistical data about data consumption, such as the date of consumption and the number of queries generated by the consumer account on a provider's Share.

In addition, providers can see any information that a consumer provides at the time of data request submissions, such as the consumer's business email and company name.

 

Can I share with Third Parties?

Sharing data is only possible between Snowflake accounts. However, if you're a provider within Snowflake, you may want to share data with a consumer outside of Snowflake. Luckily, Snowflake has created reader accounts to facilitate this process.

Reader accounts enable data to be shared with consumers who are not Snowflake customers without the need for them to become one. These accounts are owned by the provider account that created them. While the provider account uses Shares to share databases with reader accounts, the reader account can only receive data from the provider account that created it.

Users with a reader account can query shared data, but they are unable to perform DML tasks that are available in a full account.

Having introduced data sharing and its workings within Snowflake, let's explore other features that come with Snowflake's data sharing.

 

Part 4: Products that use Secure Data Sharing in Snowflake

Snowflake offers additional products that enable data sharing between providers and consumers. These products include Direct Share, Snowflake Data Marketplace, and Data Exchange.

 

Direct Share:

Direct Share is a simple method of sharing data that enables account-to-account data sharing while utilizing Snowflake's Secure Data Sharing. As the provider (account on Snowflake), you can grant access to your data to other companies, allowing them to view your data within their Snowflake account without the need to move or copy any data.

 

Snowflake Data Marketplace:

All accounts in Snowflake can access the Snowflake Data Marketplace, provided they are in non-VPS regions on supported cloud platforms. The Data Marketplace uses Snowflake's Securing Data Sharing to facilitate connections between providers and consumers, similar to the Direct Share product.

You have the option to access third-party data and import the datasets into your Snowflake account without the need for transformation. This allows you to easily combine it with your existing data. The Data Marketplace provides a central location to obtain data from multiple sellers, simplifying the process of data sourcing.

Additionally, becoming a provider and publishing data within the Data Marketplace is a great way to monetize your data and reach a wider audience.

 

Data Exchange:

Data Exchange enables secure collaboration around data between invited groups, allowing providers to share data with consumers, as well as with your entire organization, including customers, partners, or even just within your unit. It also provides you with the ability to control who has access to your data, and who can publish, consume, or simply view it. Specifically, you can invite others and determine whether they are authorized to provide or consume data. Data Exchange is available for all Snowflake accounts hosted on non-VPS regions and supported cloud platforms.

These three products in Snowflake that use secure data sharing are useful for both provider and consumer accounts (and more) within Snowflake. Now that we have seen how data sharing works and what other features use data sharing in Snowflake, let's take a look at how to use the data that was shared with you or your data that is shared with others and more.

 

Working with Shared Data:

Once you have a grasp of the fundamentals of direct share, Snowflake Marketplace, and data exchange, there are additional concepts and tools available for you to explore.

Within Snowflake, those with an ACCOUNTADMIN role can utilize the Shared Data page on the new web interface to manage and create shares. As we delve further, please note that "inbound" refers to data that has been shared with you, while "outbound" refers to data shared from your account.

 

Data Shared with You:

Provider accounts can share inbound shares with your account using Direct Share, Data Exchange, or the Snowflake Marketplace. Inbound shares allow you to view data shared by providers, including who provided the share and how the data was shared. You can also create a database from a share.

To access your inbound shares, go to the "Share With Me" tab within the Snowflake web interface. Here you will find:

  • Direct shares that are shared with you. These shares are placed into two groups: 1. Direct shares that are ready to be used and 2. Direct shares that have been imported into a database can be queried.
  • Listings for data exchange that you have access to. The data is shown under the name of the initial data exchange. If you have more than one data exchange, each data exchange will be shown within separate sections.
  • Listings for the Snowflake Marketplace data that have been moved into a database and can be queried. However, it does not show shares that are ready to be used. You can find the data listing in the Marketplace menu.

 

Data You Shared:

Your account allows you to share data with consumers through outbound shares. You can share data directly, through data exchange, or via the Snowflake Marketplace (as previously mentioned for inbound shares).

With outbound shares, you can:

  • View the shares you have created or have access to, including information such as the database for the share, consumer accounts that can access the share, the date when the share was created, and the objects that are being shared.
  • Create and edit both a share and its data listing.
  • Remove access to the share for individual consumer accounts.

Returning to the web interface, the "Shared by My Account" tab displays outbound shares from Snowflake Marketplace, data exchange, and direct shares.

When considering shares, icons are located beside each share to indicate the sharing mechanisms like direct sharing, data exchange, or Snowflake Marketplace.

Lastly, there are filters available when viewing your shared data:

  • Type: This is presented as the "Ally Types" drop-down and allows you to differentiate direct shares from listings.
  • Consumer: This is presented as the "Shared With" drop-down and allows you to select a specific consumer or data exchange (where the data has been shared).

 

Data that is Shared

When sharing data, there are many ways you can do this:

  1. Use direct share to directly share data with consumers
  2. In the Snowflake Marketplace, post a listing
  3. In data exchange, post a listing

Furthermore, when you are in the web interface and you want to share data, you will use the “Share Data” drop-down and choose from the list that provides all the platforms where you can share data.

 

Requesting Data

In the web interface, you can view inbound and outbound requests in the "Requests" tab. However, this tab does not display data requests from the Snowflake Data Marketplace.

Let's take a moment to review what inbound and outbound requests mean.

Inbound requests are made by consumers who are seeking access to your data. You can organize these requests by status and review them accordingly. Outbound requests, on the other hand, are requests made by you to obtain data listings from other providers. Just like inbound requests, you can sort them by status. Keep in mind that the requests you make may be rejected, but you can always resubmit them.

 

Managing Exchanges

In certain roles, such as the Data Exchange Admin role or if you have Provider Profile Level Privileges, you can create and organize provider profiles within the “Manage Exchanges” tab. However, if your organization does not have a data exchange, the “Manage Exchanges” tab will not be visible.

Regarding the provider profile, with this role, you can perform the following tasks within a data exchange:

  • Create, update, and delete a profile
  • Update contact email
  • Manage profile editors

Now that we have reviewed data sharing, you should be able to understand all its components and the different functions it offers!

To keep up with new features, regularly visit our website for more information and tips.

 

Conclusion:

This article provides a deep dive into data sharing and how it works within the Snowflake ecosystem. It covers the basics of data sharing, the role of data providers and consumers, and how to secure data-sharing functions. Additionally, it explores Snowflake's products that use secure data sharing, such as Direct Share, Snowflake Data Marketplace, and Data Exchange. The article also explains how to work with shared data, including managing inbound and outbound requests and managing exchanges.

Exploring Snowflake’s Search Optimization Service

Introduction:

 

In today’s article, we’ll explore together Snowflake's Search Optimization Service, a feature that can improve the performance of point lookup for certain queries by creating search access paths. The service is available on the Enterprise Edition or higher for Snowflake and is best for business users who rely on quick access to data for critical business decisions.

The article also covers how to turn on the service, its benefits, and its cost. We also introduce Snoptimizer™, our service that scans for all the Snowflake anti-patterns and optimizes your account to help you run cost-effectively.

 

History of Snowflake & Search Optimization:

 

Snowflake initially made a name for itself as the easiest data warehouse to use back in 2014. Since then it has transformed itself and its core technology into a full Snowflake Data Cloud. While the Snowflake Data Cloud Account at first comes with many amazing features by default, there are many areas where you can optimize Snowflake for your specific needs and use cases. As Snowflake has grown over the years, it has added a ton of functionality including paid services such as SnowPipe, Materialized Views, Auto Clustering, Search Optimization Service, and others.

Alternatively, it can be useful for data scientists who want to continuously explore specific subsets of data. Essentially it is a maintenance service that runs in the background of Snowflake and creates search access paths. These paths make it easier to load and populate data quickly, as well as update stale data.

 

Turning on the Feature:

 

To turn on such a feature, you must first ensure you are using an account that has access to add it to a table. Having access means you have the following privileges: ownership & add search optimization. Once that requirement is met, it’s as simple as typing the following into your console:

 

ALTER TABLE [IF EXISTS] <table_name> ADD SEARCH OPTIMIZATION;

To ensure it is turned on, show your tables and check to see that ‘Search Optimization’ says ON. A few notes to add is that you will see an increase in credit consumption while the service runs and starts to build the search access paths. You can get an estimate of the cost for specific tables before committing by running the following command:

 

SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS(‘<table_name>’)

Being strategic with the tables you introduce to the search optimization service will help greatly with reducing those costs. The service fits best for tables that aren’t queried by columns and tables that aren’t clustered.

If you add the service and decide to move it later on, you can easily do so with the correct privileges by running the following command:

 

ALTER TABLE [IF EXISTS] <table_name> DROP SEARCH OPTIMIZATION;

This is just one solution to make your life easier and queries faster, however, there are many more out there that are more cost-friendly and do not require you to look thoroughly through your tables. One of the prime examples is Snoptimizer™, our service that scans for all the Snowflake anti-patterns and optimizes your account to help you run cost-effectively. It checks your resource monitors, auto-suspend settings, cloud service consumption, and warehouse computing among other things to fix your account and ensure you are fully optimized.

 

Conclusion:

 

Snowflake's Search Optimization Service is a powerful feature that can significantly improve the speed and efficiency of certain queries. While it comes with a cost, it can be a valuable investment for business users who rely on quick access to data for critical decision-making. However, it's important to be strategic about which tables you introduce to the service to minimize costs. Additionally, there are alternative solutions available, such as Snoptimizer™, that can optimize your account and help you run cost-effectively. With the right approach, Snowflake's Search Optimization Service can be a powerful tool in your data optimization arsenal.

Too Busy for Snowflake’s Summit? We Feel You.

Introduction:

 

This Snowflake's Summit was a roller coaster of emotions, but more often than not, we were thrilled with all the new announcements. With over 61+ sessions, we got to see some of Snowflake’s amazing new features, tons of use cases, and first-hand looks at how to use their new tools with step-by-step labs. Most of us are too busy to watch two days’ worth of webinars, but that’s where we come in – providing you with your weekly dose of Snowflake Solutions! We decided to help out by highlighting the most important announcements, as well as the sessions we thought were worth the watch!

This time around Snowflake announced that they have five main areas of innovation: data programmability, global data governance, platform optimization, connected industries, and powered by Snowflake. While magical upgrades and new tools mean more flexibility for users, the reality is that most of these new features are still in private preview, so we, the public, won’t see them in action for some time. Regardless, we’ll still go through the top areas of innovation:

 

Platform optimization:

One of the most significant improvements this year is the enhanced storage economics, resulting in reduced storage costs due to improved data compression. As a result, many will begin to see savings on storage for new data. Additionally, Snowflake has developed new usage dashboards, enabling users to better monitor and comprehend their usage and costs across the platform. While it appears that Snowflake is making progress in the direction of cost optimization, the subject has been challenging so far, and there are not enough safeguards in place to prevent warehouse sizes (and bills) from skyrocketing. If you're interested in discovering the various ways your company can inadvertently lose money on Snowflake, as well as strategies for avoiding them, we invite you to register for our upcoming Cost Optimization webinar.

 

Global Data Governance:

Moving forward, we will discuss the six new data governance capabilities that have been added to the Snowflake platform. We will focus on the three that are most exciting.

 

1. Classification:

Automatically detects personally identifiable information.

  • Why is this cool? We can apply specific security controls to protect their data!

 

2. Row access policies:

Dynamically restrict the rows of data in the query based on the username, role, or other custom attributes.

  • Why is this cool? We no longer need multiple secure views and can eliminate the need for maintaining data silos. That’s a win in our book.

 

3. Access History:

A new view that shows used and unused tables to produce reports.

  • Why is this cool? You can see what’s bringing value and optimize storage costs based on what is frequently accessed or completely abandoned data. Who doesn’t love to save money?

 

Connected Industries:

Following we have two upcoming features that we thought were worth mentioning since they will be game-changers! These two features are Discover & Transact and Try Before You Buy, both of which will ease collaboration and data procurement between connected industries.

 

1. Discover and Transact:

Directly within the Snowflake Data Marketplace, a consumer can now discover data and purchase with a usage-based pricing model.

  • This is truly cool because of the self-service aspect! By providing this feature, we can significantly reduce the cost of selling and delivering data to our valuable clients.

 

2. Try Before You Buy:

Now consumers can access sample data to make sure they’re getting all they need before signing that check.

  • Why is this interesting? Everyone loves a free sample!

 

Data Programmability:

 

Probably the most important updates are under the data programmability umbrella. So, if you’re still with me, hang on a little longer, this is about to get interesting!

Some innovations are ready to be used now in public preview, so let’s check them out:

  1. SQL API: This new API enables customers to automate administrative tasks without having to manage infrastructure, there’s no need to maintain an external API management hub!
  2. Schema Detection: Now supports Parquet, Orc, Arvo, and hopefully more file formats in the future.

 

Exciting things to look forward to soon:

  1. Serverless Tasks: Snowflake will determine and schedule the right amount of computer resources needed for your tasks.
  2. Snowpark and Java UDFs: Snowpark is going to be the Snowflake developer’s new playground. It allows developers to bring their preferred languages directly into the platform. Java UDFS will also enable data engineers and developers to bring their custom code to Snowflake. This enables better performance on both sides!
  3. Unstructured Data Support: Soon, we will be able to treat unstructured data the same as structured data, with the ability to store, govern, process, and share.
  4. Machine Learning with Amazon SageMaker: A tool that will automatically build and insert the best machine-learning models into Snowflake!

 

Conclusion:

 

In summary, Snowflake's 2022 Summit exhibited several noteworthy novel features and updates, particularly in the domains of platform optimization, global data governance, and data programmability. Although a significant number of these features are still in private preview, they provide a glimpse into Snowflake's future direction and potential.

Keep an eye out for more updates and guidance from IT Strategists on how to leverage Snowflake's tools and solutions to their fullest potential. Be sure to check out our blog for more news and information.

Snowflake Data Masking

Introduction:

 

Today’s article discusses Snowflake Data Cloud's implementation of dynamic data masking, which is a column-level security feature used to mask data at query runtime. We provide a step-by-step guide on how to create and apply a data masking policy for email addresses in a stored procedure. The article also highlights the benefits of using dynamic data masking policies to secure and obfuscate PII data for different roles without access while displaying the data to roles that need access to it.

Last week, the United States Centers for Disease Control and Prevention (CDC) issued new policies regarding COVID-19 masks. We will focus on how to implement Snowflake Data Cloud's "Data Masking". Let's get started!

 

What is Data Masking?

 

Data Masking is just like it sounds… the hiding or masking of data. This is a practical method to add extra data masking for column-level security. Data Masking overall is a simple concept. It has caught on in our new age of GDPR, PII. What is Snowflake’s Version of Data Masking? Snowflake’s implementation of this is… Dynamic Data Masking.

Dynamic Data Masking is column-level security that uses masking policies to mask data at your query run time. Snowflake’s version of data masking, has several features including Masking policies that are at the schema level. Data Masking currently works to mask data at either the table or view object. The masking policies are applied at query runtime. The masking policies are applied to every location where the column is displayed. Depending on all the variables of your role, your role hierarchy, your masking policy conditions, and SQL execution content then you will see fully masked data, partially masked data, or just plain text!

Now that you know what Snowflake Data Cloud Dynamic Data Masking is then…. how do you use it? Data Masking within Snowflake is enabled with Data Definition Language (DDL). Here is the basic syntax constructs you use for the masking policy object. It is your typical object CREATE, ALTER, DROP, SHOW, DESCRIBE. This is a common feature for most Snowflake objects, and one of the reasons why I prefer Snowflake. Most of the time, it's reliable, easy to use, and consistent.

So, let’s have some fun and create a data masking policy for email addresses in a simple example. There are 3 main parts for creating and applying a dynamic data mask on Snowflake to a column. Here we go:

 

PART 1 – Enable and Grant Masking Policy

 

To enable masking policy on Snowflake, follow these steps:

  1. Grant create masking policy on schema to a role. For example: GRANT CREATE MASKING POLICY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE "DATA_MASKING_ADMIN_ROLE";
  2. Use the account admin role to grant apply masking policy on account to the role. For example: GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE "DATA_MASKING_ADMIN_ROLE";

Replace "DEMO_MASKING_DB.DEMO" with the actual schema name and "DATA_MASKING_ADMIN_ROLE" with the actual role name.

Remember to grant the necessary privileges to the roles that will use the masking policy.

 

PART 2 – Create a Masking Policy

To create a masking policy in Snowflake, follow these steps:

  1. Use a role that has the necessary privileges to create a masking policy.
  2. Use the schema where the table or view that needs the masking policy is located.
  3. Use the CREATE MASKING POLICY statement to create the policy. For example:
CREATE OR REPLACE MASKING POLICY MASK_FOR_EMAIL AS (VAL STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('HR_ROLE') THEN VAL
ELSE '*********'
END;

Replace MASK_FOR_EMAIL with the name of your masking policy. In this example, the policy masks the email column with asterisks for all roles except for the HR_ROLE.

Remember to grant the necessary privileges to the roles that will use the masking policy.

 

PART 3 – Apply the Masking Policy to a Column in a View or Table

 

To apply the masking policy to a column in a view or table in Snowflake:

  1. Use a role that has the necessary privileges to modify the table or view.
  2. Use the schema where the table or view that needs the masking policy is located.
  3. Use the ALTER TABLE or ALTER VIEW statement to modify the column and apply the masking policy. For example:
ALTER TABLE IF EXISTS EMPLOYEE MODIFY COLUMN EMAIL SET MASKING POLICY MASK_FOR_EMAIL;

Replace EMPLOYEE with the name of your table and EMAIL with the name of the column that needs the masking policy. Replace MASK_FOR_EMAIL with the name of your masking policy.

Remember to grant the necessary privileges to the roles that will use the masking policy.

(just creating a masking policy is not enough. Kind of like wearing a covid mask under your mouth and nose.  Even though you have a mask, it’s not applied really so it’s not working)

 

 

 

We will show you how to do all of this in detail below.

 

Dynamic Data Masking Example

Let’s say we want to create a data mask for email addresses in our row using a stored procedure.

If you have not been using our Snowflake Solutions Demo Database Training Example then let’s create a database, schema, and table to use.


/* SETUP DEMO DATABASE AND TABLE FOR DATA MASKING DEMO and PROOF OF CONCEPT */
USE ROLE SYSADMIN;  /*use this role or equivalent */
CREATE OR REPLACE DATABASE DEMO_MASKING_DB;
CREATE SCHEMA DEMO;
CREATE OR REPLACE TABLE EMPLOYEE(ID INT, FULLNAME VARCHAR,HOME_ADDRESS VARCHAR,EMAIL VARCHAR);
INSERT INTO EMPLOYEE VALUES(1,'Frank Bell','1000 Snowflake Lane North Pole, Alaska', 'fbell@snowflake.com');
INSERT INTO EMPLOYEE VALUES(2,'Frank S','1000 Snowflake Lane North Pole, Alaska', 'franks@snowflake.com');
INSERT INTO EMPLOYEE VALUES(3,'Craig Stevens','1000 Snowflake Lane North Pole, Alaska', 'craig@snowflake.com');
CREATE WAREHOUSE IF NOT EXISTS MASK_WH WITH WAREHOUSE_SIZE = XSMALL, INITIALLY_SUSPENDED = TRUE, auto_suspend = 60;


/* PART 0 – create and grant roles for DATA MASKING DEMO – REPLACE FREDDY WITH YOUR USERNAME– there is more to do when you use custom roles with no privileges */USE ROLE SECURITYADMIN;CREATE ROLE IF NOT EXISTS EMPLOYEE_ROLE;CREATE ROLE IF NOT EXISTS MANAGER_ROLE;CREATE ROLE IF NOT EXISTS HR_ROLE;CREATE ROLE IF NOT EXISTS DATA_MASKING_ADMIN_ROLE;GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE EMPLOYEE_ROLE;GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON DATABASE DEMO_MASKING_DB TO ROLE HR_ROLE;GRANT USAGE ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE HR_ROLE;GRANT SELECT ON TABLE DEMO_MASKING_DB.DEMO.EMPLOYEE TO ROLE HR_ROLE;GRANT USAGE,MODIFY ON DATABASE DEMO_MASKING_DB TO ROLE “DATA_MASKING_ADMIN_ROLE”;GRANT USAGE,MODIFY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE EMPLOYEE_ROLE;GRANT USAGE ON WAREHOUSE MASK_WH TO ROLE HR_ROLE;GRANT ROLE EMPLOYEE_ROLE TO USER FREDDY;GRANT ROLE MANAGER_ROLE TO USER FREDDY;GRANT ROLE HR_ROLE TO USER FREDDY;GRANT ROLE DATA_MASKING_ADMIN_ROLE TO USER FREDDY;



/* PART 1 – enable masking policy ON ACCOUNT AND GRANT ACCESS TO ROLE */GRANT CREATE MASKING POLICY ON SCHEMA DEMO_MASKING_DB.DEMO TO ROLE “DATA_MASKING_ADMIN_ROLE”;USE ROLE ACCOUNTADMIN;GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE “DATA_MASKING_ADMIN_ROLE”;



/* PART 2 – CREATE MASKING POLICY /USE ROLE DATA_MASKING_ADMIN_ROLE; USE SCHEMA DEMO_MASKING_DB.DEMO;CREATE OR REPLACE MASKING POLICY MASK_FOR_EMAIL AS (VAL STRING) RETURNS STRING ->CASEWHEN CURRENT_ROLE() IN (‘HR_ROLE’) THEN VALELSE ‘********’END;


/* PART 3 - APPLY MASKING POLICY TO EMAIL COLUMN IN EMP:LOYEE TABLE */ALTER TABLE IF EXISTS EMPLOYEE MODIFY COLUMN EMAIL SET MASKING POLICY MASK_FOR_EMAIL;



**AWESOME - NOW YOU NOW HAVE CREATED AND APPLIED YOUR DATA MASK! Let's Test it out.



/* TEST YOUR DATA MASK !!! --> TEST by QUERYING TABLE WITH DIFFERENT ROLES AND SEE RESULTS */
/* Notice the EMAIL is MASKED with ******* */
USE ROLE EMPLOYEE_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;
/* Notice the EMAIL is NOT MASKED */
USE ROLE HR_ROLE;
SELECT * FROM DEMO_MASKING_DB.DEMO.EMPLOYEE;

ADDITIONAL DETAILS:

  • **MASKS are really custom data definition language (DDL) objects in Snowflake. *YOU can always get their DDL by using the Snowflake standard GET_DDL function or using DESCRIBE./ EXAMPLES for reviewing the MASKING POLICY // when using SECURITYADMIN or other roles without USAGE you must use the full DATABASE.SCHEMA.POLICY PATH */

USE ROLE SECURITYADMIN;DESCRIBE MASKING POLICY DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL;

USE ROLE ACCOUNTADMIN; /* when using SELECT that means the ROLE MUST HAVE USAGE enabled which the SECURITYADMIN role does not have by default */

SELECT GET_DDL(‘POLICY’,’DEMO_MASKING_DB.DEMO.MASK_FOR_EMAIL’);

 

Conclusion:

 

Dynamic Data Masking Policies are a great way to secure and obfuscate your PII data to different roles without access where necessary while at the same time displaying the PII data to the roles that need access to it. We hope this tutorial has helped you understand Dynamic Data Masking on Snowflake. For further information on Snowflake, check out our blog for more tips and tricks.

New Data Shares Added in January 2021 on the Snowflake Data Marketplace

Data Shares Removed from the Snowflake Data Marketplace

New Data Shares Available in November 2020 on the Snowflake Data Marketplace

New Data Shares Available in January 2021 on the Snowflake Data Marketplace

New Data Shares Available in December 2020 on the Snowflake Data Marketplace

Snowflake Data Marketplace Introduction

Introduction:

 

The Snowflake Data Marketplace is a cloud data exchange platform where organizations can discover, access, and share live, ready-to-query data sets in a secure, governed, and compliant manner.

Long gone are the days when consumers have to copy data, use APIs, or wait days, weeks, and sometimes even months to gain access to datasets. With Snowflake Data Marketplace, analysts around the world are getting the information they need to make important decisions for their businesses in a blink of an eye and the palm of their hands.

 

 

What is it and how does it work?

 

The Snowflake Data Marketplace provides access to live, query-ready data sets. It uses Snowflake's Secure Data Sharing to connect data providers and consumers. Currently, it offers access to 229 data sets. As a consumer, you can discover and access various third-party data sets. They will be directly available in your Snowflake account to query. There's no need to transform the data or spend time joining it with your own data. If you use multiple vendors for data, the Data Marketplace gives you a single source to access everything.

 

Why is this Marketplace important?

 

Companies can now securely and instantly share and access live, regulated data in real-time without copying or moving it. In the past, getting access to such data could take days, weeks, months, or even years. With the Data Marketplace, gaining access only takes a couple of minutes. Already over 2000 businesses have requested free access to key data sets in our marketplace. This is a gold mine for anyone who wants data-driven, on-demand access to live and ready-to-query data, and the best part is that it's available globally, across clouds.

There are significant benefits for both providers and consumers. However, three key points enable companies to unlock their true potential using the Data Marketplace:

 

Source Data Faster and More Easily

  • As we said previously, using Snowflake Data Marketplace as a consumer allows users to avoid the risk and hassle of having to copy and migrate stale data. Instead, securely access live and governed shared data sets, and receive automatic updates in real-time.

Monetize Your Data

  • As a data provider on Snowflake's Data Marketplace, you can create new revenue streams by offering your governed data assets to thousands of potential Snowflake data consumers.

 

Reduce Analytics Costs

  • Using this service eliminates the costs and effort of traditional data integration. Both consumers and providers can access shared data directly, securely, and compliantly from your Snowflake account. This streamlines data ingestion, pipelines, and transformation.

 

Learn more about Snowflake Data Marketplace

 

For more information on Snowflake’s Data Marketplace, visit the official website here.

If you're curious to see Snowflake's official Marketplace demo, check it out here.

 

Action Items after reading this article:

 

  • Visit the Snowflake Data Marketplace official website.
  • Check out Snowflake's official Marketplace demo.
  • Learn more about accessing live, query-ready data sets through Snowflake's Data Marketplace.
  • Find, try out, and purchase the data and applications you require to power innovative business solutions.
  • Discover how data providers can create new revenue streams by offering governed data assets.
  • Understand how the Data Marketplace reduces analytics costs by streamlining data ingestion, pipelines, and transformation.

Creating Your First Database

What’s the benefit of learning to create a database on Snowflake?

 

It is important to learn to create databases with Snowflake because:

  • Snowflake is a popular data warehouse built for the cloud. Having experience with Snowflake will be valuable for data engineers and data scientists.
  • Snowflake has many advantages over traditional data warehouses like fast performance, scalability, and low cost. Being able to build databases on Snowflake will allow you to leverage these benefits.
  • Snowflake uses SQL, a common query language, so the skills you learn will be transferable to other systems.
  • Snowflake has many features for modeling and managing data that you can utilize by knowing how to create databases and schemas.
  • There are many resources and a large community to help you learn Snowflake.

 

Creating Databases with the User Interface:

 

To complete this tutorial, you will need a Snowflake account and the URL to log in to the web interface.

Log in to your Snowflake environment and select the Databases tab at the top left of the screen. It should look similar to the image below, except you won't have the same databases. Snowflake provides sample data.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-23.png

 

Let’s create a new database. Click the Create button and fill out the information in the pop-up window.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-24.png

 

When naming the database, there are restrictions -- no spaces and the name cannot start with a number. You can read the full set of restrictions in Snowflake’s documentation.

Select Finish and you’ll see your new database appear in the table. Click on your database and you’ll see any tables, views, or schemas that exist. Since the database has just been created, none of these objects exists yet.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-15.jpeg

Creating Schemas:

 

A snowflake schema is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database and can have its security configuration.

From inside our selected database, select the Schemas tab. Snowflake will create a public and information schema by default. The PUBLIC schema can be used to create any other objects. The INFORMATION_SCHEMA contains all the metadata for the database.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-16.jpeg

 

Click the Create button and provide and name and comment. The Managed Access option determines if the security of the objects within the schema is managed by the schema owner or the owner of the object.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-17.jpeg

Creating Databases and Schemas with Code:

 

All operations done with the UI can also be done with SQL code on the tab of the worksheet. To see what code corresponds to the operations we are doing, click on the Show SQL button.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-18.jpeg

 

Using code streamlines our Snowflake work once we understand how it functions. This can reduce time and automate tasks.

To run any SQL code in Snowflake, choose Worksheets from the main menu. This lists all databases on the left for reference and provides a space to enter your code. On the right, we see our current role, database, warehouse, and schema.

Let’s enter the code to replicate the UI process from before. When you click Run, only the line where your cursor is executes. To run multiple lines, highlight them and click Run. If you’ve already created your database, run DROP DATABASE DEMO_DB for this to work.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-25.png

 

And with those steps, you should be able to create a database. If you want to learn new techniques, keep checking our blog regularly for more helpful tips.

 

Conclusion:

 

To conclude, we have walked through the steps to create your first database and schema in Snowflake using both the user interface and SQL code. You now have the foundation to build tables, load data, and query your Snowflake data warehouse.

Semi Structured JSON Data

Introduction:

 

In today’s article we’ll go over Snowflake's support for semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. It covers the process of loading JSON data into a Snowflake table, including creating a database object, schema, table, file format, and stage. It also provides an example of querying semi-structured data using Snowflake's SQL SELECT statements.

 

What is Snowflake’s semi-structured data?

 

One of Snowflake's unique feature is its native support for semi-structured data. Snowflake supports semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. Semi-structured data is data that does not conform to a specific schema, such as JSON data. Snowflake can load JSON data directly into table columns with type VARIANT, a universal type that can be used to store values of any type. Data can be queried using SQL SELECT statements that reference JSON elements by their paths.

One of Snowflake’s unique features is its native support for semi-structured data. Snowflake supports semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. JSON is the most widely used and industry standard due to its data format and ease of use.

 

 

JSON data can be loaded directly into the table columns with type VARIANT, a universal type that can be used to store values of any type. Data can be queried using SQL SELECT statements that reference JSON elements by their paths.

Let’s take a look at our JSON data. Here our some JSON data properties:

  • Data in JSON is a name-value pair.
  • Data is separated by a comma.
  • Curly braces hold objects.
  • Square brackets hold an array.
{
"ID": 1,
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,255,255",
"hex": "#000"
}
},{
"ID": 2,
"color": "white",
"category": "value",
"code": {
"rgb": "0,0,0",
"hex": "#FFF"
}
},{
"ID": 3,
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,0,0",
"hex": "#FF0"
}
}

Database Object

We have created a new database object to load and process semi-structured data as shown below. You can use the existing one if you have already created it earlier.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

 

Schema

Create a new schema under TEST_DATABASE object to have ease of access. This step is optional if you already have access to the existing schema. In such a case you can use the existing schema.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

Table

In order to create JSON data, we need an object to hold the data and it should be capable enough to hold the semi-structured data.

In snowflake, to process the semi-structured data, we have the following data types:

  • Variant
  • Array
  • Object

We’ll be using the variant object to load data into a Snowflake table.

CREATE TABLE IF NOT EXISTS COLORS
(
TEST_DATA VARIANT
);

Object CHRISTMAS_REC is created with one column TEST_DATA that holds the object of JSON data.

 

File Format

To load the JSON object into a Snowflake table, file format is one of the mandatory objects in snowflake:

CREATE FILE FORMAT JSON_FILE_FORMAT
TYPE = 'JSON'
COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = TRUE
STRIP_NULL_VALUES = FALSE   IGNORE_UTF8_ERRORS = FALSE;

The above file format is specific to JSON. The STRIP_OUTER_ARRAY array option removes the outer set of square brackets [ ] when loading the data, separating the initial array into multiple lines. If we did not strip the outer array, our entire dataset would be loaded into a single row in the destination table.

 

Stage

In order to copy the data to a Snowflake table, we need data files in the cloud environment. Snowflake provides two types of stages:

  • Snowflake Internal stage.
  • External stages (AWS, Azure, GCP).

If you do not have any cloud platform, Snowflake provides space to store data into its cloud environment called – “Snowflake Internal stage”.

In this article, we have used a Snowflake internal stage and created a dedicated stage for semi-structured load.

CREATE STAGE IF NOT EXISTS JSON_STAGE FILE_FORMAT = JSON_FILE_FORMAT;

You can use below command to list files in stages:

LIST @JSON_STAGE;

PUT & COPY Command

 

PUT command fetches data from local storage to snowflake internal stages. You can run this command from the Snowflake CLI client. I’ll be using the Snowflake UI to do it under the database tab.

PUT file://<file_path>/sample.json @COLORS/ui1591821970011   COPY INTO "TEST_DATABASE"."TEST_SCHEMA"."COLORS" FROM @/ui1591821970011 FILE_FORMAT = '"TEST_DATABASE"."TEST_SCHEMA"."JSON_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT';

You can accomplish the same thing by using Snowflake UI under the database tab. Click on your database and then find your way to the table. Click on load data above it.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-18.png

 

Check that the data was properly loaded (SELECT from COLORS).

https://snowflakesolutions.net/wp-content/uploads/word-image-19.png

 

Querying Semi-Structured Data

Snowflake is extremely powerful when it comes to querying semi-structured data. The command works a lot like JavaScript, except we use : notation to retrieve the category for each row. By using :: notation, we define the end data type of the values being retrieved.

SELECT
test_data:ID::INTEGER as ID,
test_data:color::STRING as color,
test_data:category::STRING as category,
test_data:type::STRING as type,
test_data:code.rgb::STRING as code_rgb,
test_data:code.hex::STRING as code_hex
FROM
colors;

https://snowflakesolutions.net/wp-content/uploads/word-image-20.png

 

Conclusion:

 

The process of loading data into a database can be a cumbersome task but with Snowflake, this can be done easily. Snowflake functionality makes it possible to process semi-structured data. Check out the docs to learn more about semi-structured data.