Sunday, February 7, 2010

Using CDC and MERGE

Note: This method is applicable only for data transfer cases where data source is a table from SQL Server 2008 server.
Data flow tasks are mostly used to keep the destination table updated same as source table. The common approach to implement this process as following:
1.     Load all the data from source into a temporary table.
2.     Truncate the destination table
3.     Load the data from temporary table to destination table.


The biggest problem of this approach is transfer of full data from source to destination and for large amount of data this is a major performance concern.
SQL Server 2008 introduced two concets that we can use to optimize this process. One is CDC that is applicable at source and another is MERGE that is applicable at destination of data transfer.
CDC (Change Data Capture): This is an easy and better method to log and track all changes for a table. To start using CDC we have to set enable it at two levels: database and table. To enable CDC on a database execute the sys.sp_cdc_enable_db stored procedure. In a CDC enabled database any table can be configured to enable CDC by using sys.sp_cdc_enable_table <table_name> stored procedure. That’s all to capture all changes of a table. All the changes for a table are stored in a automatically cgenerated table named cdc.<schema_name>_<table_name>_CT
image

The above screenshot show that when CDC is enabled for table dbo.TestCDC then a system table cdc.dbo_TestCDC_CT is automatically created to store all changes.
Now in SSIS instead of fetching all data from source table we can use the CDC table to get only the changes. For example:
SELECT * FROM cdc.dbo_TestCDC_CT WHERE __$operation IN (1,2,4)
The value 1, 2, and 4 for column __$operation refers to the type of record change Deleted, Inserted and Updated respectively.
MERGE: This is another new feature introduced in SQL Server 2008 that enables us to accomplish multiple INSERT, UPDATE, and DELETE operations in a single statement.Prior to SQL Server 2008, this process required both a Lookup transformation and multiple OLE DB Command transformations. The Lookup transformation performed a row-by-row lookup to determine whether each row was new or changed. The OLE DB Command transformations then performed the necessary INSERT, UPDATE, and DELETE operations. In SQL Server 2008, a single MERGE statement can replace both the Lookup transformation and the corresponding OLE DB Command transformations.
To use the MERGE statement in a package, follow these steps:
1.     Create a Data Flow task that loads all changed data using CDC, transforms, and saves the source data to a temporary or staging table.
2.     Create an Execute SQL task that contains the MERGE statement.
3.     Connect the Data Flow task to the Execute SQL task, and use the data in the staging table as the input for the MERGE statement.


No comments: