What is a TASK object in Snowflake? What does it do?

If you are familiar with cronjobs, I view Snowflake Tasks as kind of similar. When TASKS were introduced in ____ with Streams I was kind of surprised because I thought Snowflake kind of was starting to compete at a very basic level with all of their EL and ELT partners including [ list partners here ]. The combination of streams and tasks allows for basic data pipeline creation in Snowflake without any of the many EL and ELT tools and companiues. The task, similar to a cronjob, can kick off an operation and execution of a SQL statement or SQL procedure at a specific time. It actually uses the same crontab type syntax for Tasks that are scheduled. Tasks also though can be started based upon a depency of another task so that you have a true task by task hiearchy where the NEXT task will execute ONLY when the task its dependent has completed. Also, tasks do NOT have to all be sequential. Technically, you can have I assume hundreds of subtasks I'll call them dependent on the SAME exact parent task and they can all EXECUTE in parallel if you have enough Snowflake compute resources running at the time. One other point to note is you can configure a task to NOT run again if the same task is still running. For example, let's say your task is running every 5 minutes but one time it goes to 6 minutes. In this scenario, the task will not be initiated but skipped until the next execution at 4 minutes later. ALSO, PRO TIP which if you forget is COMPLETELY annoying. YOU MUST ALTER A TASK BEFORE it is ACTIVE! If you forget this STEP than perfectly CREATED TASK will NEVER EVER RUN!

How long can you set the time-travel retention for Snowflake Standard Edition?

You can set time travel retention period for any object from 1 to 7 days on the Snowflake Standard edition currently.

What is a Snowflake Stage and how does it work?

A snowflake stage really is just a logical pointer to a physical file staging directory whether its within Snowflake or one of the cloud providers as an External Stage. IN the Snowflake Data Cloud, there are two types of Snowflake Stages, INTERNAL and EXTERNAL. Conceptually, all cloud based data systems expect you to get physical data to the cloud before operations can start on it. \n\nInternal Stages are controlled within Snowflake and some of them are created by default related to Snowflake objects. For example, BOTH User Stages and Table Stages are created by default. So every time you or someone creates either a user or a table on Snowflake, the snowflake data cloud creates sort of these default shadow stages. You have access to your own User Stage associated with your user. This is where you can load data from local data storage into your own stage. This is useful when you want to load many tables instead of just one. The Table Stage is ONLY specific to that particular table its related too. Most of the time though I only see Snowflake customers use Named Internal Stages. The one main difference between these Internal Stages and the External Stages is that you can use the commands PUT and GET with snowsql to either put or get files to and from Snowflake. \n\nExternal Stages are associated with the external cloud providers. You can currently create an external stage which references any of the three cloud platforms that Snowflake runs on including AWS, Azure, and GCP. Also, Snowflake or Snowflake Cloud based EXTERNAL stages can be created to connect to a DIFFERENT cloud provider than the cloud provider you are running your Snowflake Account on. Also, External Stages can have different security and encryption keys related to the cloud provider file storage they are referencing within the CREATE STAGE command.

I know I can have row access policy and masking policy on same table but can I actually have them on the same column?

As of now, you CANNOT specify the same exact column in both a row access policy and a data masking policy.

Can I clone a DATA SHARE within Snowflake?

No. As of 4/2022 you cannot clone data shares in Snowflake. The current objects you can clone are:\nDATABASE, SCHEMA, TABLE, STAGE, FILE FORMAT, SEQUENCE, STREAM, and TASK.

How do I know my row access policy is working? Can I audit it somehow?

Snowflake provides this functionality to check on your row access policies: \n*standard SHOW command. This produces a list of row access policies you have. \n*POLICY_REFERENCES table function - shows references. it is within information_schema. See here: \n*SNOWFLAKE.ACCOUNT_USAGE.POLICY_REFERENCES view shows the associations of the policy. [oh, also realize if you have a standard account or you have an ACCOUNT where row access policies are not enabled... this doesn't exist. ]\n*SNOWFLAKE.ACCOUNT_USAGE.ROW_ACCESS_POLICIES View provides similar information as the SHOW ROW ACCESS POLICIES COMMAND.

Can you nest row access policies in Snowflake?

Yes, you can nest Snowflake based row access policies.\nFor example, you can create row access policies dependent on a table that is dependent upon another table with another row access policy. WARNING: You should be careful not to make row access policies any more complex than necessary.

How long can you set the time-travel retention for Snowflake Enterprise and Business Critical editions?

You can set time travel retention period for any object from 1 to 90 days on both the Snowflake Business Critical and Enterprise editions!

I have a table with a Snowflake data masking policy which is on email and phone number, if I replicate the table will the data masking policy be replicated as well and work the same exact way?

As long as you are not replicating to equivalent Snowflake edition than this works no problem. If the primary database is a lower edition than what you are replicating too or if its a higher edition than you are replicating too then REPLICATION will not be allowed, [you will get an error like this below]. Also, it will not work if the Data Masking Policy references a policy within another database.

Can you enable change tracking on external tables with the CHANGES clause?

No. Change Tracking is not supported on Snowflake External Tables.

What are the advantages of Snowflake Data Cloud cloning?

I see Snowflake's cloning capability as one of their initial game changers for the RDBMS and overall data world. The introduction of this technology FINALLY allowed for truly AGILE data delivery and testing. Since their Zero Copy Cloning works based on Snowflake's Micro-partition technology it allows massive cloning of objects ONLY based on meta data. So if you have less than 1000 objects it typically takes seconds to clone. This allows you to add automated cloning processes and jobs and then to run continuous integration and development test at any point in time much quicker than any other data solutioons before. Previously, we always had BIG DATA and cost challenges with on-premise databases like Netezza, Teradata, etc. etc. because you had to create a copy of the data to run full tests and if you had sizeable data this took long amounts of time and also had significant costs to event attempt to have a staging and a development copy of data to work and test with.

How can I add an autoincrement column to a table in Snowflake?

If you need to do auto numbers or auto incrementing in Snowflake then the most versatile way to do auto-incrementing [also named IDENTITY Keys] is to CREATE A SEQUENCE. 

This SEQUENCE or really SEQUENCE object is separate from the table itself so it is great for if you need a unique value ACROSS multiple tables. Here is how you do that with a SEQUENCE named EXAMPLE_SEQUENCE_FOR_YOU with a START VALUE of 1 and an INCREMENT VALUE of 1.

CREATE SEQUENCE EXAMPLE_SEQUENCE_FOR_YOU START 1 INCREMENT 1;

Now you can either alter or create a table like this to utilize this SEQUENCE to achieve your goals.

ALTER TABLE ALTER COLUMN YOUR_AUTO_ID SET DEFAULT EXAMPLE_SEQUENCE_FOR_YOU.nextvalue;

or

CREATE OR REPLACE TABLE EXAMPLE_TABLE(YOUR_AUTO_ID SET DEFAULT EXAMPLE_SEQUENCE_FOR_YOU.nextvalue;YOUR_NAME VARCHAR);

ANOTHER WAY to achieve this auto-numbering or auto-incrementing functionality is to use the IDENTITY(START_VALUE,INCREMENT_VALUE) or AUTOINCREMENT(START_VALUE,INCREMENT_VALUE) syntax on a column [this means its part of the table itself!!].

1. CREATE the TABLE with the IDENTITY or AUTOINCREMENT VALUE LIKE THIS:
CREATE OR REPLACE TABLE EXAMPLE_TABLE(YOUR_ID AUTOINCREMENT START 1 INCREMENT 1, YOUR_NAME VARCHAR);
or
CREATE OR REPLACE TABLE EXAMPLE_TABLE(YOUR_ID IDENTITY START 1 INCREMENT 1, YOUR_NAME VARCHAR);
2. ALTER a TABLE and add the IDENTIFY VALUE COLUMN [REMEMBER – you can only do this if there is NO DATA within the TABLE!]

ALTER TABLE EXAMPLE_TABLE ADD COLUMN YOUR_ID INT AUTOINCREMENT(1,1);
or
ALTER TABLE EXAMPLE_TABLE ADD COLUMN YOUR_ID INT IDENTITY(1,1);

Note: The AUTOINCREMENT AND IDENTITY actually use sequences behind the scenes.

Are Row Access Policies available in the Snowflake Enterprise Edition?

Yes. Row Access Policies are available on the Enterprise Edition of Snowflake and higher including Business Critical Edition.

What is Row-level Security?

Row Level Security is pretty common now in modern databases. Within Snowflake it means that it has features that allow configurations on data sets within tables and views to filter out or really secure row access depending on policies set within Row Access Policy objects. These policies allow specific security logic for filtering dependent on simple and complex constructs. It honestly provides incredible flexibility but it can also CREATE sizeable complexity. To understand more on how you can implement row access security, we explain Row Access Policies and how they work in this article in depth.

Are there costs related to cloning objects in Snowflake?

When you do any initial clone of a table object [no matter the size - it could even be hundreds a table of TBs], there is no storage cost at all. This is mainly because the cloning of an object is a very small sized metadata change. The cloned object references the base objects it was cloned from on the backend by pointing to the original partitions. BUT if you insert, update, and make any changes to the table that was cloned then Snowflake makes corresponding write ahead immuable partitions for EVERY micro-partition that was changed. This is where you can start to incur storage costs. These costs can also be very significant if many of the partitions change frequently AND especially if there is a long time-travel period on the table clone.

Can external tables be cloned Snowflake?

No. As of 2/2022 you cannot clone EXTERNAL TABLES in Snowflake. The current objects you can clone are:\nDATABASE, SCHEMA, TABLE, STAGE, FILE FORMAT, SEQUENCE, STREAM, and TASK.

Can I clone a table definition without the data in the original table?

YES. This is easy to do from the interface or from code. You just use the LIKE syntax like this: \nCREATE TABLE SNOWFLAKE_SOLUTIONS_TABLE_WITH_NO_DATA LIKE SNOWFLAKE_SOLUTIONS_TABLE_BEING_CREATED_LIKE_ANOTHER;

When you query a table with a row access policy, how does the policy impact the query? How is it processed?

such as a row access policy on a table and a row access policy on a view for the same table. At query runtime, Snowflake evaluates all row access policies that are relevant to a given query in the following sequence:\n\nThe row access policy that is applicable to the table is always executed first.\nThe policy for the view is executed after evaluating the policy for the table.\nIf nested views exist (e.g. Table 1 -> View 1 -> View 2 -> … View n), the policies are applied in sequential order from left to right.\nThis pattern continues for however many row access policies exist with respect to the data in the query. The following diagram illustrates the relationship between a query operator, tables, views, and policies.

Can I use my external table as a mapping table for my row access policy?

No. This is currently not supported by Snowflake.

Can you make a TASK in Snowflake dependent on another Snowflake Task?

Yes. This is one of the main functionality of the TASK object. You can create it to be dependent on another task to create a hierarchy of task executions. Just realize that if you have too many layers of tasks and dependencies than it can become complex and difficult to debug errors.