PostgreSQL on Azure Database

January 24, 2021

Snowflake Cost Saving

We Automate SnowflakeDB Data Cloud Cost Saving. Sign Our Free 7 Days No Risk Trail Now

Data Connector Description:

Data Connector Type: Database

Data Connector Documentation:

[PostgreSQL, PostgreSQL, ” is an open source database typically used to keep in-house custom data. Fivetrans integration platform replicates data from your PostgreSQL source database and loads it into your destination.”, Supported services, Fivetran supports six different PostgreSQL database services:, Generic PostgreSQL, , Amazon Aurora PostgreSQL, Note: We do not support Amazon Aurora Serverless PostgreSQL., Amazon RDS PostgreSQL, Azure PostgreSQL, , Google Cloud PostgreSQL, Heroku Postgres, Supported configurations, Fivetran supports the following PostgreSQL configurations:, Supportability Category, Supported Values, Database versions, 8.4 – 12.0, Maximum throughput *, 5.0 MBps, Maximum row size, 400MB per row, Connector limit per database, 3, “* Maximum throughput is your connectors end-to-end update speed, measured in megabytes per second (MBps). We calculate your maximum throughput by averaging the number of rows synced per second during your connectors last 3-4 syncs. To learn more about sync speed, see the “, Replication speeds section, ., Network protocol, Supported Versions, Notes, Transport Layer Security (TLS), TLS 1.0 , TLS 1.1 , TLS 1.2, We can only support TLS versions that your corresponding version of the database supports., Which PostgreSQL database types we support depend on whether you use , WAL (logical replication), or , XMIN, as your incremental update mechanism. Read our , Updating data, documentation for more information., Database Types, WAL, XMIN, Generic PostgreSQL, xa0xa0xa0xa0xa0Primary instance, check, check, xa0xa0xa0xa0xa0Standby instance, check, check, Amazon Aurora PostgreSQL, xa0xa0xa0xa0xa0Primary instance, check, (PostgreSQL 10.6 – 12.0 only), check, xa0xa0xa0xa0xa0Standby instance, Amazon RDS PostgreSQL, xa0xa0xa0xa0xa0Primary instance, check, (PostgreSQL 9.4 – 12.0 only), check, xa0xa0xa0xa0xa0Standby instance, check, Azure PostgreSQL, xa0xa0xa0xa0xa0Primary instance, check, xa0xa0xa0xa0xa0Standby instance, check, Google Cloud PostgreSQL, xa0xa0xa0xa0xa0Primary instance, check, xa0xa0xa0xa0xa0Standby instance, check, Heroku Postgres, xa0xa0xa0xa0xa0Primary instance, check, xa0xa0xa0xa0xa0Standby instance, check, Features, Feature Name, Supported, Notes, Capture Deletes, check, WAL only, Custom Data, check, All tables and fields, Data Blocking, check, Column level, table level, and schema level, Column Hashing, check, Re-sync, check, Table level, History, API Configurable, check, Priority-first sync, dbt Package, *XMIN does not capture data that exists for an amount of time smaller than the sync interval., Setup guide, In your master database, you need to do the following:, Allow access to your PostgreSQL database via , “Fivetrans IP”, Create a Fivetran-specific PostgreSQL user with read-level permissions, (Optional) Allow access to a read-replica of your PostgreSQL database. Using a read-replica can help to avoid unnecessary strain on your master database., (WAL Only) Allow access to a logical replication slot, For specific instructions on how to set up your database, see the guide for your PostgreSQL database type:, Generic PostgreSQL, Amazon Aurora PostgreSQL, Amazon RDS PostgreSQL, Azure PostgreSQL, Google Cloud PostgreSQL, Heroku Postgres, Sync overview, Once Fivetran is connected to your PostgreSQL database or read replica, we pull a full dump of all selected data from your database. Using either the WAL or XMIN change data capture process, we pull all your new and changed data at regular intervals. , If data in the source changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination. For every schema in your PostgreSQL source database, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with., Syncing empty tables and columns, Fivetran will sync empty tables and columns for your PostgreSQL connector. For more information, see our , Features documentation, ., Replication speeds, Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the data destination. , The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes., To measure the rate of new data in your database, check the disk space usage metrics over time for databases hosted on cloud providers. For self-hosted databases, you can run the following query to determine disk space usage:, SELECT SUM(bytes)/1024/1024 AS MB FROM dba_segments;n, Schema information, Fivetran tries to replicate the exact schema and tables from your PostgreSQL source database to your destination. , NOTE: We do not sync foreign tables., Fivetran-generated columns, Fivetran adds the following columns to every table that is added to your destination: , _fivetran_synced, (UTC timestamp) keeps track of when each row was last successfully synced., _fivetran_id, ” (string) is the hash of the non-Fivetran values of each row. Its a unique ID that Fivetran uses to identify rows in tables that do not have a primary key.”, (WAL Only) , _fivetran_deleted, (boolean) marks rows that were deleted in the source database., We add these columns to give you insight into the state of your data and the progress of your data syncs., Type transformation and mapping, “As we extract your data, we match PostgreSQL data types to types that Fivetran supports. If we dont support a certain data type, we automatically change that type to the closest supported type or, in some cases, dont load that data at all. Our system automatically skips columns of data types that we dont accept or transform. “, The following table illustrates how we transform your PostgreSQL data types into Fivetran supported types:, PostgreSQL Type, Fivetran Type, Fivetran Supported, Notes, BIGINT/ BIGSERIAL, BIGDECIMAL, True, BIT, BOOLEAN, True, BOOLEAN, BOOLEAN, True, BYTEA, BINARY, True, CHARACTER VARYING, TEXT, True, CHARACTER, TEXT, True, CIDR, TEXT, True, CITEXT, TEXT, True, DATE, DATE, True, DOUBLE PRECISION, DOUBLE, True, GEOGRAPHY, JSON, True, For details, refer to the , PostGIS Geography, section, GEOMETRY, JSON, True, For details, refer to the , PostGIS Geometry, section, HSTORE, JSON, True, INTEGER/ SERIAL, INTEGER, True, INTERVAL, DOUBLE, True, JSON, JSON, True, JSONB, JSON, True, MACADDR, TEXT, True, MONEY, DECIMAL, True, NUMERIC/ DECIMAL, DECIMAL, True, POINT, (JSON, DOUBLE, DOUBLE), True, Treated like PostGIS Geometry POINT. For details, refer to the , PostGIS Geometry, section, REAL, REAL, True, SMALLINT/ SMALLSERIAL, SMALLINT, True, TEXT, TEXT, True, TIME WITH TIME ZONE, TEXT, True, TIME WITHOUT TIME ZONE, TEXT, True, TIMESTAMP WITH TIME ZONE, TIMESTAMP, True, TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP_NTZ, True, TSRANGE, (JSON, TIMESTAMP, TIMESTAMP), True, For backward compatibility, it has two additional columns with , _begin, and , _end, suffixes, TSTZRANGE, (JSON, TIMESTAMP, TIMESTAMP), True, For backward compatibility, it has two additional columns with , _begin, and , _end, suffixes, UUID, TEXT, True, BIT VARYING, False, Not yet implemented, BOXES, False, CIRCLES, False, DATERANGE, True, Unparsable values (such as , 10000-01-01, ) are synced to the destination as null values., ENUM, True, INET, True, INT4RANGE, False, INT8RANGE, False, LINE SEGMENTS, False, LINE, False, Not yet implemented, NUMRANGE, False, OID, False, PATHS, False, PG_LSN, False, POLYGONS, False, REGCLASS, False, REGCONFIG, False, REGDICTIONARY, False, REGNAMESPACE, False, REGOPER, False, REGOPERATOR, False, REGPROC, False, REGPROCEDURE, False, REGROLE, False, REGTYPE, False, TSQUERY, False, TSVECTOR, False, XML, False, Note:, The transformation of TSRANGE and TSTZRANGE data types to JSON is a beta feature and is disabled by default. To enable it, please , reach out to support, ., If we are missing an important type that you need, please , reach out to support, ., In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the , individual data destination pages, ., Unparseable values, When we encounter an unparsable value of one of the following data types, we substitute it with a default value. Which default value we use depends on whether the unparsable value is in a primary key column or non-primary key column:, PostgreSQL Type, Primary Key Value, Non-Primary Key Value, TIMESTAMP WITH TIME ZONE, 1970-01-01T00:00:00Z, null, TIMESTAMP WITHOUT TIME ZONE, 1970-01-01T00:00, null, TSRANGE *, 1970-01-01T00:00:00Z, null, TSTZRANGE *, 1970-01-01T00:00, null, *If we are unable to parse either the start or end value in your range, we substitute that value with the default value. If we are unable to parse both values, we replace both values with default values., PostGIS Geography data types, The following table lists Fivetran-supported PostGIS Geography data types. They are transformed according to the , GeoJson, specification and stored in the destination as JSON types. The GeoJson specification does not support SRID, so Fivetran ignores this data type., PostGIS Type, Fivetran Type, Notes, POINT, (JSON, DOUBLE, DOUBLE), For backward compatibility, it has two additional columns with , _long, and , _lat, suffixes, LINESTRING, JSON, POLYGON, JSON, MULTIPOINT, JSON, MULTILINESTRING, JSON, MULTIPOLYGON, JSON, GEOMETRYCOLLECTION, JSON, PostGIS Geometry data types, The following table lists Fivetran supported PostGIS Geometry data types. They are transformed according to the , GeoJson, specification and stored in the destinations as JSON types., PostGIS Type, Fivetran Type, Notes, POINT, (JSON, DOUBLE, DOUBLE), For backward compatibility, it has two additional columns with , _x, and , _y, suffixes, LINESTRING, JSON, POLYGON, JSON, MULTIPOINT, JSON, MULTILINESTRING, JSON, MULTIPOLYGON, JSON, GEOMETRYCOLLECTION, JSON, CIRCULARSTRING, JSON, COMPOUNDCURVE, JSON, POLYHEDRALSURFACE, JSON, CURVEPOLYGON, JSON, TIN, JSON, TRIANGLE, JSON, Note:, The transformation of PostGIS data types to JSON is a beta feature and is disabled by default. To enable it, please , reach out to support, ., Excluding source data, If you don‚Äôt want to sync all the data from your source database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and un-check the objects you would like to omit from syncing. For more information, see our , Column Blocking documentation, ., “Alternatively, you can restrict the Fivetran users access to certain tables or columns in your source database.”, How to allow only a subset of tables:, To grant access only to some tables in a schema, first make sure that the Fivetran user has access to the schema itself:, GRANT USAGE ON SCHEMA “some_schema” TO fivetran;n, Next, remove any previously granted permissions to all tables in that schema:, ALTER DEFAULT PRIVILEGES IN SCHEMA “some_schema” REVOKE SELECT ON TABLES FROM fivetran;nREVOKE SELECT ON ALL TABLES IN SCHEMA “some_schema” FROM fivetran;n, Repeat this command for each table you wish to include:, GRANT SELECT ON “some_schema”.”some_table” TO fivetran;nGRANT SELECT ON “other_schema”.* TO fivetran; /* all tables in schema */n, Any tables created in the future will be excluded from the Fivetran user by default. To include them, run:, ALTER DEFAULT PRIVILEGES IN SCHEMA “some_schema” GRANT SELECT ON TABLES TO fivetran;n, There is no way to grant access to all tables at once, so you need to individually select all the tables you do want. It is not possible to achieve exclusion by granting Fivetran access to all tables and then revoking access for a subset of tables., How to allow only a subset of columns:, To grant access only to some columns in a table, first remove any previously granted permission to all columns in the table:, REVOKE SELECT ON “some_schema”.”some_table” FROM fivetran;n, Then grant permission to only specific columns (for example some_column, and , other_column, :), GRANT SELECT (xmin, “some_column”, “other_column”) ON “some_schema”.”some_table” TO fivetran;n, Any new columns added to that table in the future will be excluded from access by default. To include them, re-run the command above with the new column included. , To grant access to all columns except one, you must individually grant access to all other columns. If you are using XMIN as your change data capture method, you must also grant permissions to the hidden system column , xmin, . We need access to the , xmin, column to perform our incremental updates. , You can automate this process by scripting in your favorite language. Here is an example of a fixed set of SQL commands being executed via a BASH script:, #!/bin/shnn#Fill in the values without quotes or a space after the equalsnnhost= # ex: 10.10.135.135nport= # ex: 30054nuser= # ex: usernpassword= # ex: asdf235235asfsdf212nn#List all of your SQL commands wrapped in single quotes, for examplennsql=(nREVOKE ALL PRIVILEGES, GRANT OPTION FROM fivetran;nGRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO fivetran;nGRANT SELECT ON schema1.* TO fivetran;nGRANT SELECT ON schema1.* TO fivetran;n)nn#Connecting to Redshift, hence the PGPASSWORDndon PGPASSWORD=$password psql –host=$host –port=$port –user=$user $db -c “${sql[$i]}”ndonen, Initial sync, When Fivetran connects to a new database, we first copy all rows from every table in every schema for which we have SELECT permissions (except for those that you excluded on your Fivetran dashboard) and add , Fivetran-generated columns, “. We copy rows by performing a SELECT statement on each table. For large tables, we copy a limited number of rows at a time so that we dont have to start the sync over from the beginning if our connection is lost midway. “, Updating data, Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database. We use either logical replication or the XMIN system column to perform incremental updates., Logical replication, Logical replication is based on , logical decoding, of the PostgreSQL write-ahead log (WAL). We recommend this method for incremental updates because it, Minimizes processing overhead on your PostgreSQL server, Replicates row deletions for tables with primary keys, However, there are reasons why you might not want to or be able to use logical replication:, Logical replication requires minimum version 9.4.15+, 9.5.10+, 9.6.6+, or 10.1+ (prior minor versions have bugs), In Generic PostgreSQL, May need significant extra storage space (even if log storage is only temporary), Requires a server reboot to activate loggingn, In Amazon RDS PostgreSQL, Only supported on RDS master instances, so you cannot enable it on a read replica, May need significant extra storage space (even if log storage is only temporary), Requires a master instance reboot to activate loggingn, In Azure PostgreSQL, Does not support logical replicationn, In Heroku Postgres, Does not support logical replication, In Google Cloud PostgreSQL, Does not support logical replication, There are no limitations to using logical replication for Amazon Aurora PostgreSQL., Note: Fivetran does not support logical replication if you use the “swap and drop” method of replicating data. In the “swap and drop” method, you create a temporary table and load in data from your current table. You then drop the current table and rename the temporary table so it has the same name as your current table., If logical replication is not an option for you, Fivetran can fall back on using the XMIN method., XMIN system column, The hidden PostgreSQL system column , xmin, can be used to select only the new or changed rows since the last update. The XMIN method has the following disadvantages:, “Cannot replicate row deletions, so there is no way to tell which rows in the destination are no longer present in the source database. XMIN cant track row deletions because it relies on a hidden system column in PostgreSQL tables that is effectively a “, last_modified, ” column. When a row is deleted, it doesnt appear as being recently modified because it no longer exists.”, Requires a full table scan to detect updated rows, which can slow down updates and cause significant processing overhead on your PostgreSQL server, Therefore, we only use the XMIN method if logical replication is not an option for you., Tables with a primary key, We merge changes to tables with primary keys into the corresponding tables in your destination:, An INSERT in the source table generates a new row in the destination., An UPDATE in the source table updates the data in the corresponding row in the destination., (WAL Only) A DELETE in the source table updates the corresponding row in the destination with , _fivetran_deleted = TRUE, ., Tables without a primary key, When we import tables without primary keys, we page over values of unique key columns, as well as any columns that might belong to unique indexes. Therefore, Fivetran can only import a table without a primary key when at least one column of that table has a unique key constraint and/or is part of a unique index., We handle changes to tables without a primary key in the following ways:, An INSERT in the source generates a new row in the destination., An UPDATE in the source generates a new row in the destination and leaves the old version of that row in the destination untouched. As a result, one record in your source database may have several corresponding rows in your destination. , Fivetran cannot recognize deleted rows in tables without primary keys. For more information, see the , Deleted Rows section, ., Identify tables with primary keys, “To find out which of your tables have (or dont have) primary keys, run this query: “, “SELECTn table_schema,n table_name,n (table_schema, table_name) IN (n SELECTn _schema.nspname AS table_schema,n _table.relname AS table_namen FROM pg_catalog.pg_constraint cn LEFT JOIN pg_catalog.pg_class _table ON c.conrelid = _table.oidn LEFT JOIN pg_catalog.pg_namespace _schema ON _table.relnamespace = _schema.oidn WHERE c.contype = pn ) AS has_primary_keynFROM information_schema.tablesnWHERE table_type = BASE TABLE AND table_schema NOT IN (information_schema, looker_scratch) ANDn NOT table_schema ~ ^pg_;n”, The third column in the query results has_primary_key is a binary value that shows whether or not each table has a primary key., Deleted Rows, Logical replication does not allow us to recognize deleted rows in tables without primary keys. For more information, see our , logical replication documentation, ., The XMIN update mechanism does not allow us to recognize deleted rows at all. (For more information, see our , XMIN documentation, .) Rows that are deleted in the source database are still present in the destination, because there is no way for us to identify them. If you want to recognize deleted rows, use one of these strategies to capture deletions in the source database: , Add a trigger to each table to log the primary keys of deleted rows in a separate table. Once this table has been replicated to the destination, you can use it to generate a view that excludes the deleted rows. , “Instead of removing rows, add an is_deleted column to each table and change the business logic to set this column to “, TRUE, when you delete a row. To avoid build up of obsolete rows in the source database, you can delete them after a certain period. Make sure you leave enough time for Fivetran replicate the deleted rows to the destination before you remove them (we recommend seven days)., We do not support deleting data with the , TRUNCATE, command in incremental updates. However, you can do a full re-sync to capture , TRUNCATE, deletions., Historical re-sync scenarios, If you want to migrate to another service provider, we will need to do a full re-sync of your data because the new service provider will not retain the same change tracking data as your original PostgreSQL database., Switching between the sync strategies, from XMIN to WAL or the other way around, will require you to trigger a full re-sync of the connector because the change tracking works in different ways for these strategies., n]

Join The Club

Every week, we'll be sending you curated materials handpicked by professionals. Plus, you'll be the first to know about our latest data!

Come join us for the LA Snowflake BUILD Event on Wednesday December 11th at Santa Monica Brew Works.

Login

Snowflake Solutions Expertise and
Community Trusted By

Enter Your Email Address Here To Join Our Snowflake Solutions Community For Free

PostgreSQL on Azure Database

Snowflake Cost Saving

Data Connector Description:

Data Connector Documentation:

Join The Club

Harness the Power of Data with ITS Solutions

Innovative Solutions for Comprehensive Data Management

Come join us for the LA Snowflake BUILD Event on Wednesday December 11th at Santa Monica Brew Works.

Login

Snowflake Solutions Expertise and Community Trusted By

Enter Your Email Address Here To Join Our Snowflake Solutions Community For Free

PostgreSQL on Azure Database

Snowflake Cost Saving

Data Connector Description:

Data Connector Documentation:

Join The Club

Sign in with google.com

To continue, google.com will share your name, email address, and profile picture with this site.

Harness the Power of Data with ITS Solutions

Innovative Solutions for Comprehensive Data Management

Snowflake Solutions Expertise and
Community Trusted By