- “Where possible ALWAYS have you smallest tables joined to first.”
- “Where possible ALWAYS have you largest tables joined to first”
So already you might start to doubt the validity of the rule :-) GOOD
I’ve created 5 tables all with the same structure.
1: CREATE TABLE <Table Name>
2: (
3: RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
4: SomeInt INT NULL,
5: SomeString CHAR(2) NULL,
6: SomeCSV VARCHAR(80) NULL,
7: SomeNumber MONEY NULL,
8: SomeDate DATETIME NULL
9: )
I’ve added different amounts of data to each table as follows:
--Table1 100 rows
--Table2 1000 rows
--Table3 10000 rows
--Table4 100000 rows
--Table5 1000000 rows
I then created 6 queries all with various join orders:
1: --Query1
2: SELECT m.*
3: FROM Table5 m
4: INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
5: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
6: INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
7: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
8:
9: --Query2
10: SELECT m.*
11: FROM Table5 m
12: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
13: INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
14: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
15: INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
16:
17: --Query3
18: SELECT m.*
19: FROM Table5 m
20: INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
21: INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
22: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
23: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
24:
25: --Query4
26: SELECT m.*
27: FROM Table1 m
28: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
29: INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
30: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
31: INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum
32:
33: --Query5
34: SELECT m.*
35: FROM Table5 m
36: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
37: INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
38: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
39: INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
40:
41: --Query6
42: SELECT m.*
43: FROM Table3 m
44: INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
45: INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum
46: INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
47: INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
Each Query returns 100 rows as expected and using :
1: SET STATISTICS TIME ON
I get the same result for each query: (I ran each query over 10times):
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(100 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.
For some people this might seem strange as they may have been taught that changing the order of your joins can dramatically affect your result.
Now let’s look at the execution plans:
|
|
|
|
|
|
OK so this is all very strange isn’t it!!!
If you run Queries 1 -3 together you get this:
Query 1: Query cost (relative to the batch) 55%
Query 2: Query cost (relative to the batch) 13%
Query 3: Query cost (relative to the batch) 33%
If you run Queries 4 -6 together you get this:
Query 4: Query cost (relative to the batch) 33%
Query 5: Query cost (relative to the batch) 33%
Query 6: Query cost (relative to the batch) 33%
The reason the above percentages are different and why the percentages in the Query Plans for each Clustered Index Seek/Scan are different for each query is mainly due to the statistics on the tables. After running the following statement:
1: UPDATE STATISTICS Table1
2: UPDATE STATISTICS Table2
3: UPDATE STATISTICS Table3
4: UPDATE STATISTICS Table4
5: UPDATE STATISTICS Table5
The Query Costs relative to the batch for each query became equal and the percentages for each operator in the different Execution plans became the same.
More importantly we notice that the Execution plans themselves are very much the same for each query.
At this stage I would like to point you to Grant Fritchey’s blog about a recent article Execution Plan Estimated Operator Cost for more information on the strange percentages in execution plans.
The most important thing to remember here is to understand what you are writing. Test what people tell you if you not sure, on top of this understand what an execution plan is showing you. Understand the plan don’t blindly believe the percentages …






7 comments:
Hey Christopher -
Great post. Big pet peeve of mine is generalisms/old adages without testing for yourself. Nice write up and great looking blog so far.
I'll be adding you to my reader, glad I stumbled upon your site.
Hi Chris,
There are a number of problems with your analysis.
First, as you can see in the execution plans, though you specified a variety of join orders in your queries the optimizer always joined from the smallest table (table 1) out to the biggest one.
Second, if you really want to see the effects of varying the join order in your queries, repeat your queries and use the FORCE ORDER hint. I am sure you will find that there are big differences, especially with more realistic data.
Dan Tow has written an excellent book on tuning SQL queries that goes in depth on the importance of getting the best join order. I highly recommend it.
http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1238523899&sr=8-1
Best,
Nick
So the punchline is that the optimizer will typically know what join order is best (as it did with your sample queries), but not always. That's when you need to manually tune your query.
That is a good point Nick.
But as you stated he Optimizer "will" use the correct order making our life as coders easier.
However as I mentioned in my post make sure you understand what the optimizer is doing, if you don't then you right you might not realise that the optimizer has taken the wrong path.
I should have mentioned that I meant code order and not execution plan order. As you have indicated the execution is always the same hinting at the preferred order of the JOINS
thanks for the feedback
Wow, that's crazy man. They should really try to do something to fix that.
Do you even speak English? Seriously, wall of text crits me for 99999k.
Chrissy M: Die in a fire!
Chris: I love the depth of your analysis, nice blog :P
Post a Comment