Wednesday, February 24, 2010

Find who dropped, created or altered the table, SP or any object

If any database object is dropped, created or altered accidently, sometimes you may need information about:

Who dropped, created or altered the object? When culprit not accept the mistake.

When the object was dropped, created or altered? Especially drop time of table is required for point in time recovery from database backups.

These changes are not recorded in SQL Server Error Log but are recorded in default trace. If you have not disabled the default trace and started looking into the issue soon after change occurred you could get this information. The path of trace file is “C:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Log\”. The folder MSSQL10.SQLEXPRESS may be different according to your SQL Server instance name. Here you would find 5 profiler trace files named as “log_1”, “Log_2”, etc. The file number is increased by time (in following example I used “log_16.trc”). Find the file that is most recently modified and use the below t-sql query to open that file:

SELECT ObjectName, DatabaseName, StartTime, NTDomainName, HostName, NTUserName, LoginName

FROM fn_trace_gettable('C:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Log\log_16.trc', default)

WHERE objectname is not null  

You would get the output as shown below:

image

Search the contents of SSIS packages

SSIS packages that are deployed to SQL Server or SSIS package Store are stored in msdb..sysdtspackages90 table. Not only SSIS packages but the Maintenance plans are also stored in this table.

Now suppose you are asked to list out all SSIS packages and Maintenance plans:

That update dbo.Sales table (to check dependency of a table).

That use the script file located at “D:\FTP\” folder (to check dependency of a file).

Those fetch data from database SourceDB or use login UserA (to check dependency of a source or user).

If you have not more than a dozen packages than you can check by opening each one in BIDS but if there are hundreds of packages than this approach is a very troublesome.

The better approach is to list all packages in which the particular table, connection or login is used. That would shorten the list of packages that you need to open for more details. For this purpose you can use the msdb..sysdtspackages90 table as below:

 

SELECT [name] 

FROM msdb..sysdtspackages90

WHERE CONVERT(VARCHAR(MAX), CONVERT (VARBINARY(MAX), packagedata))

      like '%dbo.Sales%' --change the literal for your search

Tuesday, February 23, 2010

Some unknown facts about Update (U) lock

You may have read and found the following statement type – lock type relation easy to understand and logical too:

SELECT – Shared (S)

UPDATE – Update (U)

INSERT & DELETE – Exclusive (X)

 

I was clear about the shared (S) and exclusive (X) but not about update (U) lock. Some of my questions about update lock were following:

Is it used only while updating a record or while deleting the record too?

As update (U) lock is converted to exclusive (X) lock then how long update (U) lock persist on a row?

 

To get the answer of these questions I did some analysis and I concluded that work in following points:

1.   Update lock is used not only with UPDATE statements but also with DELETE statements.

2.       Update lock is used only if a transaction wants to update or delete a record that is already locked (S) by another transaction. Otherwise direct X lock is used. In READ UNCOMMITTED isolation level, no shared (S) lock is used so update (U) lock can never be found in GRANT status (in sp_lock output). We can only find it in WAIT status.

3.       Update lock is “Waiting exclusive lock”. So we can never find an exclusive (X) lock in WAIT status (in sp_lock output) because in that case update (U) lock is used.

Please let me know your analysis and review.

Sunday, February 21, 2010

How to start and stop windows service from SSIS

To start or stop a windows service from SSIS package add a Script Task and in design script page use the below code. Replace the ComputerName and ServiceName accordingly.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.ServiceProcess
Public Class ScriptMain
      Public Sub Main()
        Dim controller As New ServiceController
        Dim serviceStatus As ServiceControllerStatus
        Dim conter As Integer
        'Code below is to atop Window services.
        controller.MachineName = "ComputerName"
        controller.ServiceName = "ServiceName"
If ((controller.Status.Equals(serviceStatus.Running)) Or (controller.Status.Equals(serviceStatus.Paused))) Then
            controller.Stop()
            controller.Refresh()
      End If
        'Code below is to start Window services.
If ((controller.Status.Equals(serviceStatus.Stopped)) Or (controller.Status.Equals(serviceStatus.StopPending))) Then
            controller.Start()
            controller.Refresh()
      End If





































The use of Synonyms

We know synonym is an alternate name for an object. If you want to know more about synonym then please visit my blog : http://sqlreality.com/blog/ms-sql-server-2008/synonyms-in-sql-server/
Here I will share some scenario where synonym helps in writing efficient coding.
1.       In most cases the development, testing and production server are different. In testing environment suppose you are using a table in your stored procedures from a linked server’s TestServer. Take the table’s 4 part name is TestServer.Sales.dbo.Employee. While for UAT you need to use the dbo.Employee table from another server take it UATServer and in production the server would be ProdServer.
Here while development you would use table TestServer.Sales.dbo.Employee, during UAT you would have to change the table name to UATServer.Sales.dbo.Employee and while moving into production table name would be changed to ProdServer.Sales.dbo.Employee. Replacing the name repetitively is a wastage of effort and increases the possibility of errors.
Here synonym can solve the issue at great extent. You just create a synonym as below:
CREATE SYNONYM Employee FOR TestServer.Sales.dbo.Employee
And in the coding in stored procedure, views and functions use the synonym Employee at the place of 4 part table name.  After this when you need to test the code in UAT just change the synonym to new server as below:
DROP SYNONYM Employee
GO
CREATE SYNONYM Employee FOR UATServer.Sales.dbo.Employee
In the same way the synonym can be changed to point the production server. This way a long and error prone process of find & replace table name has been eliminated with few lines of code.
NOTE: There is no ALTER SYNONYM statement to change the base object. You will have to drop and re-create the synonym.
So if there is any possibility of change the server, database, schema or object name at later time, synonym can be used to simplify the code change process.
2.       Repetitive use of 4 part name in t-sql statements makes the coding complex & harder to write & read. For example you use a table TestServer.Sales.dbo.Employee as below:
SELECT TOP 5 Emp.EmpName, Mng.EmpName 
FROM TestServer.Sales.dbo.Employee Emp INNER JOIN 
(SELECT EmpId, EmpName FROM TestServer.Sales.dbo.Employee) Mng
ON Emp.MngID = Mng.EmpID

If you create a synonym as below:

CREATE SYNONYM Employee FOR TestServer.Sales.dbo.Employee

The above statement can be replaced with below one.

SELECT TOP 5 Emp.EmpName, Mng.EmpName 
FROM Employee Emp INNER JOIN 
(SELECT EmpId, EmpName FROM Employee) Mng
ON Emp.MngID = Mng.EmpID


Here you can find the difference in the easiness while writing and reading the statement.

Synonyms in SQL Server

Wikipedia says that “Synonyms are different words with identical or very similar meanings like student and pupil is synonym, buy and purchase is synonym.”
And concept of “Synonym” in SQL Server is not different at all, as SQL Server synonym is alternate name of a database object. For example, you can define an alternate name usp_Get_Invoices for a stored procedure Sales.dbo.usp_Get_Current_Month_Invoices.
Few things we should know about synonums are following:
1.       Synonym can be used to refer an object in current server or remote server.
2.       Synonym can be created for following types of objects:
a.       Tables: User-defined that include local and global temporary tables also
b.      Stored Procedures: SQL, CLR and extended, all types of stored procedures
c.       Functions:  SQL and CLR both types of functions
d.      Views
3.       A synonym can not be the base object for another synonym.
4.       Dropping a synonym does not affect the base object.
5.       Synonyms are not schema bound to base object and remain deferred until run-time. In other words the base object’s existence, validity, and permissions are checked only at run time.
6.    Syntax to create a synonym is following:
       CREATE SYNONYM [schema_Name.]Synonym_Name FOR base_Object
7.    Syntax to drop a synonym is following:
       DROP SYNONYM [schema_Name.]Synonym_Name     
8.       Synonym can be used in DML statements. Let us test with following code:
CREATE TABLE dbo.tblTest (Col1 INT, Col2 INT)
GO
INSERT INTO dbo.tblTest VALUES(1,10)
GO
CREATE SYNONYM syn_Table FOR dbo.tblTest
GO
SELECT * FROM syn_Table                -- retunrs 1 record
GO
INSERT INTO syn_Table VALUES (1,1)
SELECT * FROM syn_Table                -- returns 2 records

9.       Synonym can not be used in DDL statements. If we execute the below statements:

ALTER TABLE syn_Table ADD Col3 INT
Following error occur:
Msg 4909, Level 16, State 1, Line 1
Cannot alter 'syn_Table' because it is not a table.

8.       Synonyms are like view in permission checking and ownership chaining. For example if an user TestUser does not have execute permission on a stored procedure dbo.proc_TestSynonym, but have execute permission on a synonym of this stored procedure dbo.syn_Proc then user can execute dbo.syn_Proc successfully. Test this using below script:

CREATE PROCEDURE dbo.proc_TestSynonym
AS
SELECT * FROM dbo.tblTest        --change the table name
GO
CREATE LOGIN TestUser WITH PASSWORD = 'TestUser', CHECK_POLICY = OFF
GO
USE Test    --DB name
GO
CREATE USER TestUser
GO
DENY EXECUTE ON dbo.proc_TestSynonym TO TestUser 
GO 
CREATE SYNONYM dbo.syn_Proc FOR dbo.proc_TestSynonym 
GO 
GRANT EXECUTE ON dbo.syn_Proc TO TestUser 
GO 
EXEC AS USER = 'TestUser' 
GO 
PRINT '1' 
EXEC dbo.proc_TestSynonym --fail with permission denied error
GO 
PRINT '2' 
EXEC dbo.syn_Proc    --successfully executes 
GO 
REVERT 
GO 
 
Use the below scrip to undo the changes performed by above script:

DROP SYNONYM dbo.syn_Proc
GO
DROP PROCEDURE dbo.proc_TestSynonym
GO
DROP USER TestUser
GO
DROP LOGIN TestUser

Sunday, February 7, 2010

Tips to improve performance of Data Flow tasks

SSIS does not have any performance evaluation or monitoring tool. So it is hard to correctly identify the culprit part of a poorly performing package. So proactive approach is the better than reactive and we should implement the best approach while creating a package. Following are few methods that a developer should know in advance to decide the best approach.

1.     Replace the Sort Data Flow item by using ORDER BY

2.     Optimize OLEDB destination using Fast Load

3.     Using CDC and MERGE

4.     Configure the Lookup cache modes

Avoid the asynchronous transformation as much as possible

It is a bit confusing if you know asynchronous processing in Service Broker or an interface application because in these platforms asynchronous processing is implemented for parallel and faster execution. But in SSIS asynchronous means that input records are not processed individually. Output is derived from the full or a part of input recordset. For example in Sort operation the first record can be decided only after checking all records.

 

The one of most important strength of SSIS is it’s buffer-oriented architecture to efficiently load and manipulate datasets in memory. The benefit of this in-memory processing is that you do not need to physically copy and stage data at each step of the data integration. Rather, the data flow engine manipulates data as it is transferred from source to destination. But because of asynchronous transformation SSIS looses its in-memory processing strength. Because asynchronous transformation task’s uses extra memory and the output recordset is buffered in separate memory than the input recordset. All blocking tasks (Aggregate and Sort) and partially blocking tasks (Merge, Merge Join, and Union All) are Asynchronous transformation.

 

image

Design Description of Alternative 1

Design Description of Alternative 2

In this design, a Script Component generates 100,000,000 rows that first pass through a lookup. If the lookup fails because the source value is not found, then an error record is sent to the Derived Column transformation where a default value is assigned to the error record. After the error processing is complete, the error rows are combined with the original data set before loading all rows into the destination.

Like Design Alternative 1, this design uses the same Script Component to generate 100,000,000 rows that pass through a lookup.

Instead of handling lookup failures as error records, all lookup failures are ignored. Rather, a Derived Column transformation is used to assign values to the columns that have NULL values for the looked up column.

With two execution trees in this scenario, the biggest performance bottleneck is related to the extra copy of the data in memory created for the Partially Blocking Union All transformation.

The performance of this solution is approximately 21% faster than Design Alternative 1. With one execution tree in this scenario, the operations are consolidated and the overheard of copying data into a new buffer is avoided.

Configure the Lookup cache modes

Cache Mode specify the caching of data in lookup task. We can specify one of following three lookup mode for an Lookup task:
1.     Full cache
2.     Partial cache
3.     No cache


In full cache all data from loop table are loaded into local cache in advance. In Partial cache mode, lookup table is queried for every new value and the fetched record is cached for further use. In No cache mode lookup table is queried for each record and fetched record from lookup table is not cached. 
Let us understand the hit and load statistics by an example:
Suppose we have two table Invoice and Customer. Invoice table has CustomerID that is foreign key, referencing to the CustomerID of Customer table. Suppose
Number of records in Invoice table = 100,000
Number of records in Customer table = 10,000
Count of distinct CustomerID in Invoice table = 2,000
Now following is the hit-load statistics in different cache modes:
Full cache:
Number of query hit to Customer table = 1
Number of records fetched from Customer table = 10,000
Partial cache:
Number of query hit to Customer table = 2,000
Number of records fetched from Customer table = 2,000
No cache:
Number of query hit to Customer table = 100,000
Number of records fetched from Customer table = 100,000

Using CDC and MERGE

Note: This method is applicable only for data transfer cases where data source is a table from SQL Server 2008 server.
Data flow tasks are mostly used to keep the destination table updated same as source table. The common approach to implement this process as following:
1.     Load all the data from source into a temporary table.
2.     Truncate the destination table
3.     Load the data from temporary table to destination table.


The biggest problem of this approach is transfer of full data from source to destination and for large amount of data this is a major performance concern.
SQL Server 2008 introduced two concets that we can use to optimize this process. One is CDC that is applicable at source and another is MERGE that is applicable at destination of data transfer.
CDC (Change Data Capture): This is an easy and better method to log and track all changes for a table. To start using CDC we have to set enable it at two levels: database and table. To enable CDC on a database execute the sys.sp_cdc_enable_db stored procedure. In a CDC enabled database any table can be configured to enable CDC by using sys.sp_cdc_enable_table <table_name> stored procedure. That’s all to capture all changes of a table. All the changes for a table are stored in a automatically cgenerated table named cdc.<schema_name>_<table_name>_CT
image

The above screenshot show that when CDC is enabled for table dbo.TestCDC then a system table cdc.dbo_TestCDC_CT is automatically created to store all changes.
Now in SSIS instead of fetching all data from source table we can use the CDC table to get only the changes. For example:
SELECT * FROM cdc.dbo_TestCDC_CT WHERE __$operation IN (1,2,4)
The value 1, 2, and 4 for column __$operation refers to the type of record change Deleted, Inserted and Updated respectively.
MERGE: This is another new feature introduced in SQL Server 2008 that enables us to accomplish multiple INSERT, UPDATE, and DELETE operations in a single statement.Prior to SQL Server 2008, this process required both a Lookup transformation and multiple OLE DB Command transformations. The Lookup transformation performed a row-by-row lookup to determine whether each row was new or changed. The OLE DB Command transformations then performed the necessary INSERT, UPDATE, and DELETE operations. In SQL Server 2008, a single MERGE statement can replace both the Lookup transformation and the corresponding OLE DB Command transformations.
To use the MERGE statement in a package, follow these steps:
1.     Create a Data Flow task that loads all changed data using CDC, transforms, and saves the source data to a temporary or staging table.
2.     Create an Execute SQL task that contains the MERGE statement.
3.     Connect the Data Flow task to the Execute SQL task, and use the data in the staging table as the input for the MERGE statement.


Optimize OLEDB destination using Fast Load

When you insert data into your target SQL Server database, use minimally logged operations if possible. When data is inserted into the database in fully logged mode, the log will grow quickly because each row entering the table also goes into the log. This is implemented using Fast Load options in OLEDB Destination Editor.
Therefore, when designing Integration Services packages, consider the following:

  • Try to perform your data flows in bulk mode instead of row by row. By doing this in bulk mode, you will minimize the number of entries that are added to the log file. This reduction will improve the underlying disk I/O for other inserts and will minimize the bottleneck created by writing to the log.

  • Set a value for Maximum insert commit size. This option decides the size of data loading transaction. Specify the batch size that the OLE DB destination tries to commit during fast load operations. The default value of 214,748,3647 indicates that all data is committed in a single batch after all rows have been processed. With the default value the transaction log of database can grow up to the limit of disk size and can cause full transaction log error. The bigger problem may arise if you have implemented transaction and the task fails after a large amount of data load. In this case all data load will be rolled back and that may decrease the server performance significantly. So set the value of Maximum insert commit size to a moderate value like 10,000.

 

Replace the Sort Data Flow item with ORDER BY

For some data operations like Merge and Merge Join transformations require sorted inputs. SSIS provide a task “Sort” to sort the input recordset. As we know sorting is a blocking process and degrade the package performance. The better solution is to create the sorted recordset using ORDER BY clause at data source task. This would improve the performance because of two reasons. First it split the load of sorting on the source connection server. If the source server is different than SSIS server than it would help in distribute the processing load on both servers. Second it can use the indexes of source tables to get the sorted recordset. Now when we have pre-sorted source data, we can provide a hint for downstream components that the data is sorted. To provide a hint that the data is sorted, we have to do the following tasks:

  • Set the IsSorted property on the output of an upstream data flow component to True.

  • Specify the sort key columns on which the data is sorted.

SSIS Checkpoint

          Checkpoints enable a failed SSIS package to be restarted at the task where the execution was ended.

          Checkpoint is used to avoid repeating the downloading and uploading of large files or to avoid repeating the tasks that consumes system resources heavily.

          Checkpoints are enabled by setting the package’s SaveCheckpoints property to True in the SSIS package properties.

          Once checkpoints are enabled, you also need to tell the SSIS package where to write the checkpoint data. To do this, you must supply a filename to the CheckpointFileName property.

          In addition, the way SSIS treats running packages where there is an existing checkpoint file is controlled by the CheckpointUsage property. Available option for this property are Never, Always and IfExists.

 

Let us understand the behavior of checkpoint by an example:

Create a new package in SSIS project and change the SaveCheckpoints property to True and CheckpointFileName property to “C:\SSIS_checkpoint.xml”. Add three “Execute SQL Tasks”. The statements and control flow is as below:

Task A: Select 1/1

Task B: Select 1/0 - Which will introduce an error

Task C: Select 1/4

Now execute the SSIS package. It will failed at Task B as below:

clip_image001

Now update SQL statement of the Task B to Task C: Select 1/2 and re-execute the package. This time package will start at Task B as below:

clip_image002

Comparison between T-SQL and SSIS expression

An Expression is a combination of identifiers, literals, functions, and operators that returns a single data value. We are very familiar with functions and operators in T-SQL. But writing an expression in SSIS is not exactly same as in T-SQL. But that is not so much different also to worry about. If we know the difference between the functions and operators of T-SQL and SSIS then we can save our time of learning the SSIS expression separately.

 

Operators: Besides the following 5, all other operators are same in T-SQL and SSIS.

 

 

T-SQL

SSIS

Logical AND

AND

&&

Logical OR

OR

||

Logical Not

NOT

!

Equal

=

==

Conditional

IF boolean_expression

          expression1

ELSE  expression2

boolean_expression ? expression1 : expression2

 

 

Functions: In SSIS, functions of following 4 categories are supported.

·         Mathematical functions: To perform calculations based on numeric input values provided as parameters to the functions and return numeric values. Following mathematical functions only are supported in SSIS. Functionality of these functions is same in SSIS and T-SQL:

ABS

LN

SIGN

EXP

LOG

SQUARE

CEILING

POWER

SQRT

FLOOR

ROUND

 

 

·         String functions:  To perform operations on string or hexadecimal input values and return a string or numeric value.

 

T-SQL

SSIS

Search a string in another

CHARINDEX

FINDSTRING

Remove leading trailing spaces

LTRIM and RTRIM

TRIM

 

Following other string functions are supported in SSIS. Functionality of these functions is same as T-SQL:

HEX

REPLACE

RTRIM

LEN

REPLICATE

SUBSTRING

LOWER

REVERSE

UPPER

LTRIM

RIGHT

 

 

·         Date and time functions: To perform operations on date and time values and return string, numeric, or date and time values. Only one T-SQL date function DATENAME is not supported by SSIS. All other functions are same in T-SQL and SSIS.

·         System functions.

 

Few other differences are as following:

 

 

T-SQL

SSIS

Data type conversion

CAST, CONVERT

(DATA_TYPE) Expression

Behavior of NULL

Determines whether or not a given expression is NULL.

Returns a null value of a requested data type.