Join The Stats Party!!!

Registration Details

Please fill in your details below to get accurate JUST THE STATS communications.

SnowAdmin

SnowAdmin

SnowCompare is the easiest and fastest way to compare & deploy Snowflake data from one database or schema to another.
While Snowflake allows you to code and write SQL to compare data it’s still regularly cumbersome for a regular user or analyst.
Zero Copy Cloning is so easy in Snowflake but what happens when you cloned a database several times and you want to understand the differences between the clone and the original database?
This is where SnowCompare comes in and makes this super easy to visually see the differences.
Get on the waiting list for this free tool!  We plan to release in October.

Find out more about all the benefits SnowAdmin has to offer you and your business. Sign up for a free proof of concept!

THE SNOWFLAKE SUMMIT RECAP – 2019

THE SNOWFLAKE SUMMIT RECAP – 2019

The first snowflake summit finally happened on June 3rd to 6th and lived up to the expectation of many people who were interested in the summit. The four days summit had more than two thousand attendees, one hundred and twenty presentations across seven tracks, seven keynote presentations, more than thirty hands-on labs, more than thirty-five theatre sessions and more than thirty countries represented by the attendees.

A quick recap of the summit…

Day 1

The first day of the summit majorly involved attendees of the summit undertaking an essential snowflake training which ended with the trainees taking an exam. This was a smooth and exciting experience as people were placed in rooms where they had their background scripts and environments with snowflake representatives ready to help anyone out. The exam was made of two parts, the first part was made of multiple choices relating to the training done, and the second part was done upon passing the first part, which was practical. The practical involved creating a user, a database, a table that loaded from a Google spreadsheet and executing various transformations that would load in the final table.

Day 2

The significant aspects of the day involved making important announcements about new snowflake features. The features included snowflake being available on Google cloud, external tables, snowflake organizations, data replication, data exchange, and data pipeline. The significant announcements are explained below:

  •      Snowflake announced that it would be available on the Google platform for 2020. This would ensure that organizations using snowflake get seamless and secure data integration across various platforms, thus enabling them to choose the right vendor for their business. It will also be easy for customers to utilize Google’s ecosystem of applications. Customers also can use the Google cloud platform and manage applications across multiple clouds.
  •      Snowflake also introduced new data pipeline features that allow customers to query data directly from their data lake on Azure Blob Storage or AWS S3 which enables them to maintain the data lake as the single source of truth.
  •      Snowflake’s data exchange is currently available for viewing privately with the public viewing being set later in the year. The data exchange is free to join market place for enabling users to connect with data providers for seamlessly discovering assessing and generating insights from the user’s data.

Day 3

The keynotes on the third day started with Alison Levine, who is the author of “on edge,” giving an informative talk on leadership. The founders of snowflake Benoît Dageville, who is the current president of products, and Thierry Cruanes, who is the current CTO, also gave a talk on the reason for starting snowflake. They did this by referencing their vision of; “Simply load and query data”. The day ended with Kevin O’Brien of Kiva.org and Julie Dodd of Parkinson’s UKshowing how data could be used to make the world a better place.

Day 4

The last day of the summit saw Matthew Glickman, the Snowflake VP of Customer and Product Strategy, giving a closing keynote on some of its customer’s journey to be data-driven. Some of the customer’s representatives invited on stage included Brian Dumman, the Chief Data and Analytics Officer, McKesson, Yaniv Bar-Dayan, Cofounder and CEO, Vulcan Cyber, and Michal Klos, Senior Director of Engineering, Indigo/Localytics. By the end of the summit, it was clear that the future of data had arrived with snowflake having the capability of providing trusted data solutions to its customers.

The 2020 summit will be better

The 2020 summit will be held on June 1st to 4th at the Aria Hotel in Las Vegas, which is a bigger venue. Considering the success of the snowflake 2019 summit, the 2020 summit will be more significant and will have more activities. I honestly can’t wait for it.

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

SnowCompare

SnowCompare

SnowCompare is the easiest and fastest way to compare & deploy Snowflake data from one database or schema to another.
While Snowflake allows you to code and write SQL to compare data it’s still regularly cumbersome for a regular user or analyst.
Zero Copy Cloning is so easy in Snowflake but what happens when you cloned a database several times and you want to understand the differences between the clone and the original database?
This is where SnowCompare comes in and makes this super easy to visually see the differences.
Get on the waiting list for this free tool!  We plan to release in October.

Find out more about all the benefits SnowCompare has to offer you and your business. Sign up for a free proof of concept!

SnowSheets

SnowSheets

SnowSheets allows you to connect Google Sheets to a Snowflake database.

You can:
  • View (select) Snowflake data within Google Sheets.
  • Create tables, update, insert, and delete rows from tables and keep Google Sheets synchronized with Snowflake.
  • Allows for easier editing of data within Snowflake.

Get on the waiting list for this free tool! We plan to release in September.

Find out more about all the benefits SnowSheets has to offer you and your business. Sign up for a free proof of concept!

Integrating Databricks with Snowflake

Integrating Databricks with Snowflake

Overview: Here is a practical guide to getting started with integrating Databricks with Snowflake.  We will get you started with the basic setup and show how easy it is to get the two of them connected to each other to write and read data from the other.

Pre-Requisites:

  1.  You must have a Snowflake Account (the good thing is that this is really easy!)
    – 30 Day free trial, including $400 of credits.  Free Snowflake Account Setup
  2. You need to setup at least a Databricks Community edition (The Community Edition is free)
    – The Databricks Snowflake connector is included in Databricks Runtime 4.2 and above –

https://databricks.com/try-databricks

You should have some basic familiarity with Dataframes, Databricks, and Snowflake.

High level we are really doing these main steps:

  1.  Import a notebook that already has a shell of the code you need.
  2.  Fill in the details in the notebook for your Snowflake database.
  3.  Execute the 3 separate parts of the notebook which will be
    1. making the connection.
    2. writing a dataframe to snowflake
    3. reading a snowflake table back into a dataframe

Steps:

Okay.  Now that you have a Databricks Account setup then login.
I’m going to assume you have the Community Edition so the login is here:  (if you have the regular editions then login to the appropriate area)

https://community.cloud.databricks.com/login.html

Then once you are logged in you should see a screen like this:

Go to Workspace icon (It is the 3rd from the top on the left handside)
Once you click on it then to the right there will be a dropdown arrow to the right of the menu item “Workspace”.  When you click there then click on Import and it should look like this:

Databricks Import URL

Then Choose URL there and put in this notebook link and click the Import Button.
https://docs.databricks.com/_static/notebooks/snowflake-python.html

*This is one of my favorite parts about Databricks on how they make it easy to share Notebooks and be more organized.

Once you have imported the notebook it should look like this:

Databricks Notebook to Snowflake

There are 3 main sections to this sample connector notebook from Databricks:

1.  The Snowflake connection.  You need to fill in all the details in blue.  You should setup the databricks secrets to start.

Then make sure you add the other appropriate details within here (database, schema, warehouse):

*- You will notice on the image there is in the upper right a run button.  If you have worked with Jupyter Notebooks this is very similar.

2.  Write data to a snowflake table.  Fill in the sections in blue.  Mainly just what table you want to write to.  If the table is not created then it will create it for you.  (you can change this part if you want to test more specificially with data you have and create a dataframe from existing data — spark.range(5).write)

3.  Read data to a snowflake table.  Fill in the sections in blue.  Mainly just what table you want to read from which is the one you just created.

That’s it.  This is a very simple example but hopefully it shows you its pretty straightforward to connect Databricks and Snowflake.  Connecting Spark itself outside of Databricks is relatively easy as well but you do need to deal with having the Spark to Snowflake Connector as well as the JDBC Driver setup.  The great part of Databricks 4.2 and higher is that this is already setup for you.

Enjoy!

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

How To Setup Confluent with Snowflake

How To Setup Confluent with Snowflake

Overview:  Here is a practical guide to getting started with setting up Confluent with Snowflake

Pre-Requisites:

  1.  You must have a Snowflake Account (the good thing is that this is really easy!)
    – 30 Day free trial if you do not have a Snowflake Account yet.  Free Snowflake Account Setup
  2.  This setup requires using Docker.  (I’ll have separate instructions to do this without Docker later)
  3.  You also need git.

Here we go – there are really 3 main parts to this setup:

  1. Get this docker version of confluent/kafka up and running.
  2. Create a topic on it to generate data for moving data into Snowflake
  3. Setup the Kafka to Snowflake Connector as the Destination with the right Snowflake connectivity

Part 1 – Get docker version of Confluent/Kafka running.

Let’s go…execute these commands in sequence:

Okay….The first time it will take a few minutes downloading…and you will see output eventually like this:

If you want to verify that everything is up and running then execute this command:

Then you should see some output like this:

You can also see from the Confluent Dashboard/Visualizations of the work you are going to do below.  (sometimes it takes awhile for this to refresh)

http://localhost:9021/

Part 2 – Create a topic to send Data to Snowflake.  Generate data for it with the DataGen functionality.

Let’s go…execute these commands in sequence:

Create a topic:

Generate data for the topic:
Let’s first configure a json file for the data we want to create.
Use whatever editor you want and create this file u.config

I’m using vi u.config and paste this in there:

u.config details

Part 3 – Download the Kafka to Snowflake Connector and configure it.

Okay.  So you have Confluent/Kafka up and running.  You have data generating into a topic.

So now just download the magical Kafka to Snowflake connector here: https://mvnrepository.com/artifact/com.snowflake/snowflake-kafka-connector/0.3.2
I’m sure by the time I publish this the version will change but for now assume it’s this one.

Once you have the file in the same directory we have been using for everything then copy it to the connect virtual machine where it needs to be to work.

You now need to create the configuration file to setup the Connector and the Sink associated with it that connects to the Snowflake Database.  This does assume you have already setup your rsa key.  You do have to fill in 6 of the settings below to have this setup for your specific configuration.Again, use your favorite editor.  I’m using:
vi connector_snowflake.config and entering in my specific details.

connector_snowflake.config details

Okay.  Almost there.  Now use this configuration file to setup the sink.

Now in a few seconds or minutes if you setup everything correctly the topic should be writing to the Snowflake table. Go into the database and schema you connected to and you should be able to execute something like:

Now you should see data flowing from Kafka to Snowflake. Enjoy!

Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!

A New Era of Cloud Analytics

A NEW ERA OF CLOUD ANALYTICS WITH SNOWFLAKE AS THE HADOOP ERA ENDS

Hadoop was regarded as a revolutionary technology that would change data management and completely replace data warehousing. Such a statement is partly accurate but not entirely true since it has not been the case ever since cloud solutions came into the picture. Hadoop mostly flourishes with projects that involve substantial data infrastructure, meaning it was more relevant around a decade ago when most data analysts Continue reading

Not On The High Street: Improving customer experience with Snowflake

Not On The High Street: Improving customer experience with Snowflake

Companies like notonthehighstreet.com are taking their customers’ experience to the next level, with an online marketplace delivering unique products and services in a singularly convenient way. Without speedy data delivery though, as attested to by their Director of Data in this video, this marketplace just wouldn’t keep their customers coming back for more. Their countless partners benefit from this as well, but Continue reading

Snowflake vs Netezza

Snowflake vs Netezza

Fifteen years ago, IBM introduced an appliance-based, on-prem analytics solution known as Netezza. It was purpose built, load ready, and met a lot of the needs of the day (back when on-prem was still largely the preferred choice for data warehousing solutions). One could say IBM really hit the ball out of the park, and Netezza has definitely enjoyed a good, solid run since then, but that was fifteen years ago, and times have Continue reading

Looker

Looker is a business intelligence software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily.

Looker is the ideal tool to use with Snowflake:

  • Analyze both structured and semi-structured data with ease
  • Leave your data in Snowflake, granularly control Continue reading

ETL vs ELT: Data Warehouses Evolved

ETL vs ELT: Data Warehouses Evolved

For years now, the process of migrating data into a data warehouse, whether it be an ongoing, repeated analytics pipeline, a one-time move into a new platform, or both, has consisted of a series of three steps, namely: Continue reading

Snowflake vs Hadoop

Snowflake vs Hadoop

Lots of people are aware of Hadoop for its advantages, like ease of data loading and scaling, but more and more are becoming increasingly aware of its limitations, like Continue reading

Snowflake vs Redshift

Snowflake vs Redshift

We have been building data systems for years and this is the most excited we’ve been in years with all new capabilities within the cloud with Redshift, Google Big Query, and Snowflake. Today we wanted to share with you some results based on our estimating a relatively small 2TB cloud data warehouse on Snowflake and on Redshift for a client. Then we also wanted to go through all the differences we see. Continue reading

Snowflake vs Teradata

Snowflake vs Teradata

To anyone with even a passing level of familiarity of this space, Teradata is quite rightly known as a powerhouse in the data warehousing and analytics arena. it’s been the go-to technology for sectors ranging from various 3 letter intelligence agencies, to the most recognizable of medicine, science, auto, and telecom industry players combined.

We have been building data systems for years, including those incorporating Teradata (such as for a Data Warehousing project we executed for TicketMaster back in 2007), so this really is the most excited we’ve been in all that time, given all the new capabilities available now, as compared to then. Today we wanted to share with you some findings based on several industry-leading reports concerning cloud data warehousing and analytics in 2017/18, especially as it concerns Snowflake when compared to such an industry giant such as Teradata. Continue reading

Strava: Data Sharing with Snowflake

Strava: Data Sharing with Snowflake

Data companies like Strava are really vertical pioneers, as they’ve created a veritable social network for athletes to upload, track, and compete with other athletes worldwide. As attested to by their data engineer in this video, without data, Strava wouldn’t exist, and the more people find they have access to it, the more they hunger for it. Yet as they grew, this imposed significant delays in the time it took for users to query their data, so beyond the data sharing features Snowflake uniquely provides, there were multiple benefits Strava encountered by using Snowflake. Continue reading

SpringServe: Data Sharing with Snowflake

SpringServe: Data Sharing with Snowflake

If ever there was an industry to discover huge benefits from Snowflake’s Data Sharing technology, it’s Advertising!

SpringServe delivers ads, in video format, with a reputation for providing immediate reporting on ad performance. They serve hundreds of thousands of ad requests per second, and as their collaborations and partnerships grew, so did the number of Continue reading

Localytics: Data Sharing with Snowflake

Localytics: Data Sharing with Snowflake

Localytics provides market engagement analysis services to makers of apps far and wide. Being a data company, and given the prevalence of mobile apps today, plus with how many clients were making use of their SDK, the scale of their data requirements climbed into the Petabytes. The costs for this level of data, using their legacy data warehousing system, shot into territory that just no longer made sense for them as a business (to say nothing for an ever growing latency issue as well). Continue reading

Playfab: Data Sharing with Snowflake

Playfab: Data Sharing with Snowflake

Here’s a great example of how Playfab, an online back-end service for game developers, leverages Snowflake’s Data Sharing functionality. Video games are increasingly delivered as part of an online service, and so game developers, now more than ever, are in need of one, secure space from which to host all their assets, tools, and data (both outbound, and from their players)! Continue reading

Semi-Structured Data Loading & Querying in Snowflake

Semi-Structured Data Loading & Querying in Snowflake

Unstructured data has become a tremendously popular format today (be it JSON, Avro, Parquet, or XML) for a multitude of things such as IoT, mobile, and apps. Until now though, there’s been no fast and easy way to store and analyze this kind of data, but Continue reading