Assuming your Internal Stage already has been created and you did not have a Snowflake Directory Table associated with it then you can easily add a Snowflake Directory Table by using the following command: Also, remember, a directory table has no grantable privileges of its own so you must have the right privileges related to the Snowflake Stage you are altering. ALTER STAGE
Archives: Answers
Answer
Can I use Snowflake Row Access Policies on External Tables?
Yes. You can use Row Access Policies on External Tables. Similar to regular tables, you can add a row access policy for when you CREATE the external table or you can ALTER the external table and add a Row Access Policy. \n\nKEY POINTS on EXTERNAL TABLES and ROW ACCESS POLICIES (RAP)\n*You cannot add a row access policy to a virtual column directly [The only way to work around this would be to create a view on top of the external table and apply the RAP to the view columns)\n*When you clone \n*You cannot use an external table as a mapping table within a row access policy at this time. \n\nIf the data in the external table is necessary for the row access policy, consider moving the external table data to a dedicated schema within the database in which the row access policy exists prior to completing a clone operation. Update the row access policy to reference the fully qualified table name to ensure the policy refers to a table in the cloned database.
Does Snowflake support correlated subqueries?
Yes, it does support correlated subqueries but they have to be correlated scalar subqueries in WHERE clauses.
You can also find more information about subqueries in Snowflake at Can you perform nested queries in snowflake?
Can you perform nested queries in snowflake?
Nested queries (subqueries; a query within another query) can be performed in Snowflake.
When the subqueries are in a WHERE or FROM clause, they are used to provide data that will be used to have a limit or compare the data that is returned by the containing query.
There also are different types of subqueries:
Correlated VS Uncorrelated
-Correlated subqueries contain columns from outside the subquery. In correlated subqueries, the columns are generally referenced in the WHERE clause, as if the subquery were evaluated on every row of the table in the outer query.
-Uncorrelated subqueries have no external column references. This query is independent and its results are returned one time only (not for each row) and used by the outer query.
Scalar VS Non-scalar
-Scalar subqueries return a single value (one column of one row). When no rows are eligible to be returned, the subquery returns NULL.
-The result of a non-scalar subquery can be zero, one or more rows, with one or more columns in each row. Whenever a column does not have a value to return, the subquery returns NULL. The subquery returns 0 rows if no rows qualify to be returned (not NULLs).
But the specific subqueries that are supported by Snowflake are:
-Uncorrelated scalar subqueries in any place where a value expression can be used
-Correlated scalar subqueries the clause WHERE
-EXISTS, ANY / ALL, and IN subqueries in WHERE clauses and they can be correlated or uncorrelated
Can I use the SELECT INTO syntax for Snowflake?
No, you cannot use the SELECT INTO syntax for Snowflake as it does not have that statement. However, there are other alternatives that can be used instead such as CREATE TABLE as SELECT statement in order to create a table by copying or duplicating the existing table or based on the result of the SELECT query.
With CREATE TABLE as SELECT you are able to:
-Create a table with selected columns, by changing column names and data types from an existing table.
-Create a table with the result of a query or duplicated a table from an existing table.
Can I perform top 10 syntax in snowflake database?
Yes, you can perform top syntax in the snowflake database.
It can be done by using TOP and this will constrain the maximum number of rows returned by statement or subquery.
Important to Note: When you use TOP make sure that is a non-negative integer constant and also note that LIMIT is equivalent to TOP .
Furthermore, when using TOP the ORDER BY clause does not have to be used. But if you DO NOT use the clause then the results will be non-deterministic as the results within the result set are not in a specific order. So if you want to ensure that the results returned are controlled, be sure to use the ORDER BY clause.
What is zero copy cloning in Snowflake ?
Zero copy cloning in Snowflake is the feature that allows the duplication of an object without having to create a physical copy and without having to add any more storage costs. This can be done through the command word CLONE and this allows the creation of copies of tables, schemas, and databases. However, this feature is not limited to these things, it can also create clones of other objects such as file formats and sequences to name a few. The great thing about copy cloning is that this feature allows a lot of flexibility without having to worry about the cost.
Can we have time travel set differently for each table in our snowflake database?
Yes. This is really easy to do. Just be careful that if you start setting high retention periods on frequently changing tables that storage amounts and costs can increase significantly.
What is the difference between Fail-Safe and Time-Travel in Snowflake?
While they are both features that make sure you have access to your historical data, Fail-safe is a way to recover historical data in case there was a system failure or other type of failure, and Time Travel allows you to query, clone, and restore the historical data that is within the tables, schemas, or databases (allows this for 90 days).
What can you use for data profiling with the Snowflake database?
First, what is data profiling? It can be seen as looking at and analyzing data sources in order to create summaries. This helps gather information that companies can use to help their business.
Data profiling is also important to ensure that your data is truly accurate of how your business is doing by seeing if there are any errors within the data shown.
So now, what can you use for data profiling within Snowflake? Snowflake actually has a collaboration with Talend. This collaboration ensures that when legacy data is migrating to Snowflake’s cloud data warehouse, that it is correct and completed.
However, Snowflake’s structure is made so that it can work with many other profiling tools. Some of the specific tools have been Talend Open Studio, Pentaho PDI, and Etlworks just to name a few but these all work differently.
Can I clone a table DDL and not the data?
YES. This is easy to do from the interface or from code. You just use the LIKE syntax like this: \n(okay. maybe its technically not code level DDL but it accomplishes the same thing)\nCREATE TABLE SNOWFLAKE_SOLUTIONS_TABLE_WITH_NO_DATA LIKE SNOWFLAKE_SOLUTIONS_TABLE_BEING_CREATED_LIKE_ANOTHER;
How many days is the time travel history preserved in Snowflake?
Actions such as querying, cloning, and restoring historical data from the tables, schemas, and databases are available for up to 90 days in Time Travel. However, this is within the Extended Time Travel which only comes from the Snowflake Enterprise Edition. This means that for the standard accounts the standard time for Time Travel is one day.
Can I connect PySpark to snowflake?
Yes, you can connect PySpark to Snowflake as Snowflake is compatible with both Python and Spark.
The Snowflake Connector for Spark, also known as Spark Connector, uses Snowflake within the Apache Spark system which allows Spark to read and write data to Snowflake.
What is Snowflake Time Travel?
Snowflake Time Travel gives access to historical data (which is any data that has either been changed, lost, or deleted).
With Time Time Travel you can also:
1. Make clones of schemas, tables, or even databases
2. Restore these schemas, tables, and databases
However, once the period of time that is defined has passed, the data will be moved into Snowflake Fail-Safe and these features can no longer be carried out.
What is fail-safe in Snowflake ?
Fail-safe protects historical data in case there is a system failure or any other failure.
Fail-safe allows 7 days in which your historical data can be recovered by Snowflake and it begins after the Time Travel retention period ends.
It is important to note that Fail-safe should only be used when all other recovery options have been tried and it can be used only by Snowflake in order to recover data that may have been lost due to any types of system failures.
Lastly, recovering data through Fail-safe can take hours to days.
Is there a cost associated with Time Travel in Snowflake?
In short, Time Travel does have some storage fees imposed for maintaining historical data.
The way that these fees are calculated is by each day (24 hours) from each time the data has changed. By maintaining only the necessary information to restore the roots from a table that has been updated or deleted, Snowflake lessens the amount of storage that is needed for historical data. Essentially, the only storage usage that is calculated is the percentage of the table that changed.
In order to manage the storage fees with Time Travel, Snowflake has two table types (Temporary and Transient) that don’t have the same storage fees as Permanent Tables.These two table types' maximum additional fees are limited to one day.
As a result, the maximum additional fees incurred for Time Travel by these types of tables is limited to 1 day.
I’m having trouble connecting to Snowflake from R. Any suggestions?
If you want to connect to Snowflake from R (RStudio) you will need to use the ODBC (Open Database Connectivity) driver on either Windows, MacOS, or Linux.
Here are some steps on how to do this:
1. The first step is to create an ODBC DSN (Data Source Name)
2. Then you will test if the ODBC DSN is working
3. Lastly, the code that can be used on RStudio is:
install.packages(c("DBI", "dplyr","dbplyr","odbc"))
library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
myconn <- DBI::dbConnect(odbc::odbc(), "SNOWFLAKE_DSN_NAME", uid="USERNAME", pwd='Snowflak123')
mydata <- DBI::dbGetQuery(myconn,"SELECT * FROM EMP")
head(mydata)
Having problems connecting to snowflake from Pentaho? Any suggestions?
First you want to make sure that the Pentaho version is the 6.1 version or above.
Next, these steps will help you create the JDBC connection to Snowflake through Pentaho:
1. You will need to download the Snowflake JDBC driver to the workstation while at the same time running Pentaho
2. Then you will copy Snowflake_jdcb.jar to the Pentaho folder:
-On Windows this will be: C:\ProgramFiles\pentaho\design-tools\data-integration\lib
-On Mac OS this will be: …/pentaho/design-tools/data-integration/lib
3. Lastly, you will configure a Generic Database connection within Pentaho through:
Url: jdbc:snowflake://.snowflakecomputing.com:443/
Driver Class Name: net.snowflake.client.jdbc.SnowflakeDriver
What is Airflow?
Airflow is a platform designed to automate, schedule, and monitor workflows in an automated way. It is mainly used for data processing pipelines, computational workflows, and ETL processes.
The tool can be customized to meet each need and this makes other tasks easier to complete.
Some of the other features of Airflow include:
1. Directed Acyclic Graphs and they represent a workflow in Airflow
2. Easy access and interaction with logs
3. Set up alerts
4. Monitoring interface
5. PythonOperator
Furthermore, some of the things that will be needed for Airflow are Python, a Snowflake account and access to the latest Apache-Airflow.
To integrate Apache-Airflow with Snowflake you will:
1. Configure Apache-Airflow with snowflake connection.
2. Open “localhost:8080” within the browser
3. Go under “Admin” and then “Connections”
4. Click on the + symbol and then add a new record
5. Lastly, choose the connection type as “Snowflake”
Then you will need to create a DAG file:
1. Go to the folder you have chosen as your AIRFLOW_HOME
2. Find the DAGs folder within subfolder_dags
3. Inside the DAG folder you will need to paste the Python file
4. Create a Python file with the name “snowflake_airflow.py”
Workflows will automatically be picked up and scheduled to run.
How do you add a stream to a table in snowflake?
You would use the command CREATE STREAM (or REPLACE STREAM) and it would create a new stream in the current schema or it would replace an existing stream. An example would be: create stream mystream on table mytable;
The command also supports:
-CREATE STREAM...CLONE (to create a clone of the existing stream)
However, if change tracking hasn't been enabled then only the table owner can make the first stream on the table. Adding more streams would require a role that has been granted these privileges:
-Database: USAGE
-Schema: CREATE STREAM, USAGE
-Source table: SELECT