Introduction:

 

In today’s article we’ll go over Snowflake’s support for semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. It covers the process of loading JSON data into a Snowflake table, including creating a database object, schema, table, file format, and stage. It also provides an example of querying semi-structured data using Snowflake’s SQL SELECT statements.

 

What is Snowflake’s semi-structured data?

 

One of Snowflake’s unique feature is its native support for semi-structured data. Snowflake supports semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. Semi-structured data is data that does not conform to a specific schema, such as JSON data. Snowflake can load JSON data directly into table columns with type VARIANT, a universal type that can be used to store values of any type. Data can be queried using SQL SELECT statements that reference JSON elements by their paths.

One of Snowflake’s unique features is its native support for semi-structured data. Snowflake supports semi-structured data in the form of JSON, Avro, ORC, Parquet, and XML. JSON is the most widely used and industry standard due to its data format and ease of use.

 

 

JSON data can be loaded directly into the table columns with type VARIANT, a universal type that can be used to store values of any type. Data can be queried using SQL SELECT statements that reference JSON elements by their paths.

Let’s take a look at our JSON data. Here our some JSON data properties:

  • Data in JSON is a name-value pair.
  • Data is separated by a comma.
  • Curly braces hold objects.
  • Square brackets hold an array.
{
"ID": 1,
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,255,255",
"hex": "#000"
}
},{
"ID": 2,
"color": "white",
"category": "value",
"code": {
"rgb": "0,0,0",
"hex": "#FFF"
}
},{
"ID": 3,
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgb": "255,0,0",
"hex": "#FF0"
}
}

Database Object

We have created a new database object to load and process semi-structured data as shown below. You can use the existing one if you have already created it earlier.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

 

Schema

Create a new schema under TEST_DATABASE object to have ease of access. This step is optional if you already have access to the existing schema. In such a case you can use the existing schema.

CREATE DATABASE IF NOT EXISTS TEST_DATABASE;

Table

In order to create JSON data, we need an object to hold the data and it should be capable enough to hold the semi-structured data.

In snowflake, to process the semi-structured data, we have the following data types:

  • Variant
  • Array
  • Object

We’ll be using the variant object to load data into a Snowflake table.

CREATE TABLE IF NOT EXISTS COLORS
(
TEST_DATA VARIANT
);

Object CHRISTMAS_REC is created with one column TEST_DATA that holds the object of JSON data.

 

File Format

To load the JSON object into a Snowflake table, file format is one of the mandatory objects in snowflake:

CREATE FILE FORMAT JSON_FILE_FORMAT
TYPE = 'JSON'
COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = TRUE
STRIP_NULL_VALUES = FALSE   IGNORE_UTF8_ERRORS = FALSE;

The above file format is specific to JSON. The STRIP_OUTER_ARRAY array option removes the outer set of square brackets [ ] when loading the data, separating the initial array into multiple lines. If we did not strip the outer array, our entire dataset would be loaded into a single row in the destination table.

 

Stage

In order to copy the data to a Snowflake table, we need data files in the cloud environment. Snowflake provides two types of stages:

  • Snowflake Internal stage.
  • External stages (AWS, Azure, GCP).

If you do not have any cloud platform, Snowflake provides space to store data into its cloud environment called – “Snowflake Internal stage”.

In this article, we have used a Snowflake internal stage and created a dedicated stage for semi-structured load.

CREATE STAGE IF NOT EXISTS JSON_STAGE FILE_FORMAT = JSON_FILE_FORMAT;

You can use below command to list files in stages:

LIST @JSON_STAGE;

PUT & COPY Command

 

PUT command fetches data from local storage to snowflake internal stages. You can run this command from the Snowflake CLI client. I’ll be using the Snowflake UI to do it under the database tab.

PUT file://<file_path>/sample.json @COLORS/ui1591821970011   COPY INTO "TEST_DATABASE"."TEST_SCHEMA"."COLORS" FROM @/ui1591821970011 FILE_FORMAT = '"TEST_DATABASE"."TEST_SCHEMA"."JSON_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT';

You can accomplish the same thing by using Snowflake UI under the database tab. Click on your database and then find your way to the table. Click on load data above it.

 

https://snowflakesolutions.net/wp-content/uploads/word-image-18.png

 

Check that the data was properly loaded (SELECT from COLORS).

https://snowflakesolutions.net/wp-content/uploads/word-image-19.png

 

Querying Semi-Structured Data

Snowflake is extremely powerful when it comes to querying semi-structured data. The command works a lot like JavaScript, except we use : notation to retrieve the category for each row. By using :: notation, we define the end data type of the values being retrieved.

SELECT
test_data:ID::INTEGER as ID,
test_data:color::STRING as color,
test_data:category::STRING as category,
test_data:type::STRING as type,
test_data:code.rgb::STRING as code_rgb,
test_data:code.hex::STRING as code_hex
FROM
colors;

https://snowflakesolutions.net/wp-content/uploads/word-image-20.png

 

Conclusion:

 

The process of loading data into a database can be a cumbersome task but with Snowflake, this can be done easily. Snowflake functionality makes it possible to process semi-structured data. Check out the docs to learn more about semi-structured data.

Snowflake Cost Saving

we automate snowflakeDB data cloud cost saving. sign our free 7 days no risk trail now