Integrating Databricks with Snowflake
Overview: Here is a practical guide to getting started with integrating Databricks with Snowflake. We will get you started with the basic setup and show how easy it is to get the two of them connected to each other to write and read data from the other.
- You must have a Snowflake Account (the good thing is that this is really easy!)
– 30 Day free trial, including $400 of credits. Free Snowflake Account Setup –
- You need to setup at least a Databricks Community edition (The Community Edition is free)
– The Databricks Snowflake connector is included in Databricks Runtime 4.2 and above –
You should have some basic familiarity with Dataframes, Databricks, and Snowflake.
High level we are really doing these main steps:
- Import a notebook that already has a shell of the code you need.
- Fill in the details in the notebook for your Snowflake database.
- Execute the 3 separate parts of the notebook which will be
- making the connection.
- writing a dataframe to snowflake
- reading a snowflake table back into a dataframe
Okay. Now that you have a Databricks Account setup then login.
I’m going to assume you have the Community Edition so the login is here: (if you have the regular editions then login to the appropriate area)
Then once you are logged in you should see a screen like this:
Go to Workspace icon (It is the 3rd from the top on the left handside)
Once you click on it then to the right there will be a dropdown arrow to the right of the menu item “Workspace”. When you click there then click on Import and it should look like this:
Then Choose URL there and put in this notebook link and click the Import Button.
*This is one of my favorite parts about Databricks on how they make it easy to share Notebooks and be more organized.
Once you have imported the notebook it should look like this:
There are 3 main sections to this sample connector notebook from Databricks:
1. The Snowflake connection. You need to fill in all the details in blue. You should setup the databricks secrets to start.
Then make sure you add the other appropriate details within here (database, schema, warehouse):
*- You will notice on the image there is in the upper right a run button. If you have worked with Jupyter Notebooks this is very similar.
2. Write data to a snowflake table. Fill in the sections in blue. Mainly just what table you want to write to. If the table is not created then it will create it for you. (you can change this part if you want to test more specificially with data you have and create a dataframe from existing data — spark.range(5).write)
3. Read data to a snowflake table. Fill in the sections in blue. Mainly just what table you want to read from which is the one you just created.
That’s it. This is a very simple example but hopefully it shows you its pretty straightforward to connect Databricks and Snowflake. Connecting Spark itself outside of Databricks is relatively easy as well but you do need to deal with having the Spark to Snowflake Connector as well as the JDBC Driver setup. The great part of Databricks 4.2 and higher is that this is already setup for you.
Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!