Sunday, February 7, 2010

Replace the Sort Data Flow item with ORDER BY

For some data operations like Merge and Merge Join transformations require sorted inputs. SSIS provide a task “Sort” to sort the input recordset. As we know sorting is a blocking process and degrade the package performance. The better solution is to create the sorted recordset using ORDER BY clause at data source task. This would improve the performance because of two reasons. First it split the load of sorting on the source connection server. If the source server is different than SSIS server than it would help in distribute the processing load on both servers. Second it can use the indexes of source tables to get the sorted recordset. Now when we have pre-sorted source data, we can provide a hint for downstream components that the data is sorted. To provide a hint that the data is sorted, we have to do the following tasks:

  • Set the IsSorted property on the output of an upstream data flow component to True.

  • Specify the sort key columns on which the data is sorted.

No comments: