Friday, March 5, 2010

Best practices to write complex JOINs on multiple tables

Always use ANSI standard joins: Tables can be joined in two join standards: ANSI and Transact-SQL. Syntax for these two standards is following:
--Transact-SQL standard:
SELECT tableA.colX, tableB.colY
FROM tableA, tableB
WHERE tableA.col1= tableB.col1 

--ANSI standard:

SELECT tableA.colX, tableB.colY
FROM tableA INNER JOIN tableB
ON tableA.col1= tableB.col1 

In earlier version of SQL Server, Transact-SQL standard had *= and =* operator for LEFT and RIGHT OUTER joins. But now these operators have been deprecated. So always use the ANSI standard for unanimity and clarity. Find the easier to understand query out of following two same queries:

--Query 1: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA, tableB
LEFT OUTER JOIN tableC
ON tableA.col2 = tableC.col2
WHERE tableA.col1= tableB.col1 

--Query 2: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA INNER JOIN tableB
ON tableA.col1= tableB.col1 
LEFT OUTER JOIN tableC
ON tableA.col2 = tableC.col2

I find the second query easier because it tells the relation between joining tables at the time of joining and uses same syntax to join all tables.

Do not mix LEFT and RIGHT OUTER JOIN: If we are joining three tables with LEFT or RIGHT OUTER joins than we can write in any one of following three ways:

--Query 1: Using left outer join 

SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA LEFT OUTER JOIN tableB
ON tableA.col1= tableB.col1 
LEFT OUTER JOIN tableC
ON tableB.col2 = tableC.col2

--Query 2: Using left and right outer join<?xml:namespace prefix = o />

SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableB RIGHT OUTER JOIN tableA
ON tableA.col1= tableB.col1 
LEFT OUTER JOIN tableC
ON tableB.col2 = tableC.col2

--Query 3: Using right outer join<?xml:namespace prefix = o />

SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableC RIGHT OUTER JOIN tableB
ON tableB.col2 = tableC.col2
RIGHT OUTER JOIN tableA
ON tableA.col1= tableB.col1 

Here you would observe that Query 2 is most complex and Query 1 is easiest to understand. We can always convert RIGHT OUTER JOIN into LEFT OUTER JOIN or vice versa by just changing the sequence of tables. So why not we write the queries in easiest way.


Write INNER joins first and then LEFT OUTER joins: If a table is joined with INNER and LEFT OUTER joins than always write the INNER joins first and then LEFT OUTER joins. Find the easier to understand query out of following two same queries:

--Query 1: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA LEFT OUTER JOIN tableB
ON tableA.col1= tableB.col1 
INNER JOIN tableC
ON tableA.col2 = tableC.col2

--Query 2: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA INNER JOIN tableC
ON tableA.col2 = tableC.col2
LEFT OUTER JOIN tableB
ON tableA.col1= tableB.col1 


I find second query easier because it defines the essential result-set first and then joins with other tables for additional data.


Keep the ON clause adjacent to appropriate JOIN: Find the easier to understand query out of following two same queries:

--Query 1: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA INNER JOIN tableB
LEFT OUTER JOIN tableC
ON tableA.col1= tableB.col1 
ON tableA.col2 = tableC.col2

--Query 2: 
SELECT tableA.colX, tableB.colY, tableC.colZ
FROM tableA INNER JOIN tableB
ON tableA.col1= tableB.col1 
LEFT OUTER JOIN tableC
ON tableA.col2 = tableC.col2

I find second query easier because it tells the relation within joining tables at the time of joining.


Use database diagram to identify the relation between tables:
If I am to write a query to know the Salesperson, product and the total sold quantity of that product, I have to identify the relation between Salesperson, Product and there sold quantity. Following is the screenshot of database diagram of required tables in AdventureWorks database:


clip_image002

Now when we know the relation, we can write a query to join these tables:

SELECT Emp.LoginID, Prod.Name, SUM (SOD.OrderQty) AS TotalQuantitySold
FROM HumanResources.Employee Emp 
LEFT OUTER JOIN Sales.SalesOrderHeader SOH
ON Emp.EmployeeID = SOH.SalesPersonID
INNER JOIN Sales.SalesOrderDetail SOD
ON SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN Production.Product Prod
ON SOD.ProductID = Prod.ProductID
GROUP BY Emp.LoginID, Prod.Name
ORDER BY Emp.LoginID, Prod.Name

The screenshot of query result is following:


clip_image002[5]

1 comment:

Vlad said...

Hi it is all clear in your example where tables linked sequentially.

My question is – how (in what order) tables must be included in FROM clause in common case.

Information for this is order in which tables were added + list of links between tables. Links could be created in completely different order than tables order.
How using this information I can create correct FROM clause?

I tried play with SQL Server 2005 tolls to find out algorithm but sometime it is not clear.

Do you have any explanation or documentation about in which order tables must be included in FROM clause.

Thanks,
Vlad