For those of you working with Snowflake and using Snowpipe for CDC & data pipelines, here's how it can be accomplished through Fivetran
Fivetran uses a process called Change Data Capture (CDC) to replicate data from source databases to Snowflake. CDC is a process of tracking changes to data in a database and then replicating those changes to another destination.
Fivetran supports two types of CDC for Snowflake:
Log-based CDC: This type of CDC involves reading the database's transaction logs to identify changes to data.
Trigger-based CDC: This type of CDC involves using database triggers to capture changes to data.
Fivetran uses log-based CDC for most Snowflake databases. Trigger-based CDC is only used for a limited number of Snowflake databases.
Once Fivetran has identified changes to data, it replicates those changes to Snowflake using a process called Snowflake Teleport Sync. Snowflake Teleport Sync is a high-performance data replication technology that can handle large volumes of data quickly and efficiently.
Here is a more detailed overview of how Fivetran replicates data from source databases to Snowflake using log-based CDC:
Fivetran's replication process is highly efficient and reliable. It can handle large volumes of data and complex data sets without any problems. Fivetran also offers a number of features that make it easy to manage and monitor the replication process, such as real-time monitoring and alerts.
Fivetran supports trigger-based CDC for the following Snowflake source databases:
Fivetran uses log-based CDC for all other Snowflake source databases. Here is a more detailed overview of how Fivetran uses trigger-based CDC to replicate data from source databases to Snowflake:
Trigger-based CDC is a good option for source databases that do not have native CDC functionality or for source databases where you need more control over the replication process. |
Here are some of the benefits of using trigger-based CDC:
More control: Trigger-based CDC gives you more control over the replication process. You can specify which tables to replicate, which columns to replicate, and how often to replicate the data. More reliable: Trigger-based CDC is more reliable than log-based CDC. This is because trigger-based CDC does not rely on the database transaction logs, which can be corrupted or lost. More efficient: Trigger-based CDC is more efficient than log-based CDC for small changes to data. This is because trigger-based CDC only sends the changes to Fivetran, while log-based CDC sends the entire transaction log. |
Trigger-based CDC is a good option for businesses that need a reliable and efficient way to replicate data from source databases to Snowflake.
Snowflake Teleport Sync is a high-performance data replication technology that can be used to replicate data from source databases to Snowflake. It is designed to be fast, scalable, and reliable.
Snowflake Teleport Sync uses a variety of techniques to achieve high performance, including:
Parallel processing: Snowflake Teleport Sync can process multiple data streams in parallel, which can significantly improve performance.
Batching: Snowflake Teleport Sync batches changes to data before sending them to Snowflake. This can help to improve performance by reducing the number of network requests.
Compression: Snowflake Teleport Sync compresses data before sending it to Snowflake. This can help to reduce network bandwidth usage and improve performance.
Snowflake Teleport Sync also uses Snowflake stages, streams, and tasks to manage the replication process.
Stages: Stages are temporary storage locations in Snowflake where data is stored before it is loaded into tables. Snowflake Teleport Sync uses stages to store data that is being replicated from source databases. Streams: Streams are continuous flows of data that are ingested into Snowflake. Snowflake Teleport Sync uses streams to replicate data from source databases in real time. Tasks: Tasks are units of work that are executed by Snowflake. Snowflake Teleport Sync uses tasks to perform various tasks, such as loading data from stages into tables and transforming data. |
Snowflake Teleport Sync does not use Snowpipe. Snowpipe is a Snowflake streaming service that is designed to ingest and process large volumes of data in real time. Snowflake Teleport Sync is designed to replicate data from source databases to Snowflake, regardless of whether the data is streaming or batch.
To update target tables in Snowflake, Snowflake Teleport Sync uses the following steps:
Snowflake Teleport Sync loads the data from the source database into a stage in Snowflake.
Snowflake Teleport Sync transforms the data, if necessary.
Snowflake Teleport Sync loads the data from the stage into the target table in Snowflake.
Snowflake Teleport Sync can also be used to update existing data in target tables. To do this, Snowflake Teleport Sync uses the following steps:
Snowflake Teleport Sync identifies the rows in the target table that need to be updated.
Snowflake Teleport Sync loads the updated data into a stage in Snowflake.
Snowflake Teleport Sync merges the data from the stage into the target table in Snowflake.
Snowflake Teleport Sync is a powerful and flexible data replication technology that can be used to replicate data from source databases to Snowflake in a variety of ways. It is a good choice for businesses that need a fast, scalable, and reliable way to replicate data to Snowflake.
Here are some of the benefits of using Snowflake Teleport Sync:
Fast: Snowflake Teleport Sync is designed to be fast. It can replicate large volumes of data quickly and efficiently.
Scalable: Snowflake Teleport Sync is scalable. It can handle large volumes of data and complex data sets without any problems.
Reliable: Snowflake Teleport Sync is reliable. It has been proven to be able to handle large volumes of data with minimal downtime.
Flexible: Snowflake Teleport Sync is flexible. It can be used to replicate data from a variety of source databases to Snowflake in a variety of ways.
Snowflake Teleport Sync supports a variety of data transformations, including:
|
Snowflake Teleport Sync also supports a number of pre-defined functions that can be used to perform common data transformations. For example, there are functions for converting currencies, calculating percentages, and finding the minimum or maximum value in a set of data.
You can use Snowflake Teleport Sync to transform data in the following ways:
Transform data before it is loaded into Snowflake: You can use Snowflake Teleport Sync to transform data before it is loaded into Snowflake. This can be useful for cleaning data, converting data to the correct data type, and performing other data transformations.
Transform data as it is being loaded into Snowflake: You can use Snowflake Teleport Sync to transform data as it is being loaded into Snowflake. This can be useful for performing real-time data transformations.
Transform data after it has been loaded into Snowflake: You can use Snowflake Teleport Sync to transform data after it has been loaded into Snowflake. This can be useful for updating existing data or performing complex data transformations.
Snowflake Teleport Sync is a powerful and flexible data transformation tool that can be used to perform a variety of data transformations on data that is being replicated to Snowflake.
I notice gaps in data transformation capabilities in many of the tools I'm evaluating. I'm sharing additional specific details on the transformation capabilities if you're planning to use this in your data pipeline. Type casting Snowflake Teleport Sync can cast data from one data type to another. This can be useful for converting data to the correct data type for the Snowflake table or for performing other data transformations. For example, you can cast a string to a number or a date to a datetime. You can also cast data from one Snowflake data type to another, such as from VARCHAR to STRING or from TIMESTAMP_NTZ to TIMESTAMP_LTZ. Null handling Snowflake Teleport Sync can handle null values in a variety of ways. You can choose to ignore null values, replace them with a default value, or set them to a specific value, such as NULL. You can also use Snowflake Teleport Sync to perform conditional expressions based on null values. For example, you can skip loading a row of data if it contains a null value in a specific column. String manipulation Snowflake Teleport Sync can perform a variety of string manipulation operations, such as:
You can use string manipulation operations to clean data, prepare data for other data transformations, or create new data columns. Date and time manipulation Snowflake Teleport Sync can perform a variety of date and time manipulation operations, such as:
You can use date and time manipulation operations to clean data, prepare data for other data transformations, or create new data columns. Mathematical operations Snowflake Teleport Sync can perform a variety of mathematical operations, such as:
You can use mathematical operations to perform calculations on data, such as calculating percentages, calculating averages, and finding the minimum or maximum value in a set of data. Logical operations Snowflake Teleport Sync can perform a variety of logical operations, such as:
You can use logical operations to perform conditional expressions on data, such as skipping a row of data if it does not meet certain criteria or creating a new data column based on the results of a logical expression. Conditional expressions Snowflake Teleport Sync allows you to evaluate conditional expressions and perform different actions based on the result. For example, you can skip loading a row of data if it does not meet certain criteria or create a new data column based on the results of a conditional expression. Conditional expressions can be used to perform complex data transformations and to clean data. Pre-defined functions Snowflake Teleport Sync also supports a number of pre-defined functions that can be used to perform common data transformations. For example, there are functions for:
You can use pre-defined functions to perform common data transformations quickly and easily. Transforming data before, during, and after loading Snowflake Teleport Sync can be used to transform data before, during, and after it is loaded into Snowflake. This gives you flexibility to choose the best time to perform data transformations based on your specific needs. For example, you can transform data before it is loaded into Snowflake to clean the data and convert it to the correct data type. You can also transform data as it is being loaded into Snowflake to perform real-time data transformations. Additionally, you can transform data after it has been loaded into Snowflake to update existing data or to perform complex data transformations. |
Sash Barige
Aug/25/2023
Sources: Fivetran Documentation including https://fivetran.com/docs/databases/snowflake;
Snowflake Documentation including https://docs.snowflake.com/en/user-guide/data-load-considerations-stage
Comments