Thursday, 11 June 2009

Random Numbers

This is going to be a very short post, but I keep seeing it pop up in forums so thought I would put my two cents in.

When trying to generate a random number say between 1 and 5, I’m noticing a lot of people using the follow notation:

   1: CAST(5 * RAND() + 1 as INT)

This method will work if you are looking to obtain a single random number. If however you are selecting from a table and you want to generate a random number for each row the above notation will produce the same number for ever row.

Ok so you might say that all you need to do is change the seed of the RAND() function to get a different value for each row. True however the best solution for this I’m seen so far is a follows:

   1: CAST(5 * RAND(CHECKSUM(NEWID())) + 1 as INT)

Now this does work but seems to be over kill. So…

I came across a method a few years ago on how to generate “true” random numbers in T-SQL and it looks something like this:

   1: ABS(CHECKSUM(NEWID())) % 5 + 1

Now you will note that when using this against a set of data each row will get a random number. Below is an example using all the notations as to help illustrate my point:

   1: SELECT TOP 10
   2:   ABS(CHECKSUM(NEWID())) % 5 + 1 as [True Random],
   3:   CAST(5 * RAND() + 1 as INT) as [RAND Random],
   4:   CAST(5 * RAND(CHECKSUM(NEWID())) + 1 as INT)
   5: FROM sysObjects

 

Please let me know if you have any other solutions, I’d be interested to get peoples options on this?

Thursday, 26 March 2009

The order of INNER JOIN’s

I want to address a topic that I think a lot of people have been taught in the past and still live by today. I’ve heard different variations of this “rule” in the years that I have been programming and they go something like this.

  • “Where possible ALWAYS have you smallest tables joined to first.”
  • “Where possible ALWAYS have you largest tables joined to first”

So already you might start to doubt the validity of the rule :-) GOOD

I’ve created 5 tables all with the same structure.

   1: CREATE TABLE <Table Name>
   2: (
   3:     RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
   4:     SomeInt INT NULL,
   5:     SomeString CHAR(2) NULL,
   6:     SomeCSV    VARCHAR(80) NULL,
   7:     SomeNumber MONEY NULL,
   8:     SomeDate DATETIME NULL
   9: )

I’ve added different amounts of data to each table as follows:

--Table1 100 rows
--Table2 1000 rows
--Table3 10000 rows
--Table4 100000 rows
--Table5 1000000 rows

I then created 6 queries all with various join orders:

   1: --Query1
   2: SELECT m.*
   3: FROM Table5 m
   4:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
   5:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
   6:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
   7:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
   8:  
   9: --Query2
  10: SELECT m.*
  11: FROM Table5 m
  12:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
  13:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
  14:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
  15:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
  16:  
  17: --Query3
  18: SELECT m.*
  19: FROM Table5 m
  20:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
  21:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
  22:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
  23:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
  24:  
  25: --Query4
  26: SELECT m.*
  27: FROM Table1 m 
  28:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
  29:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
  30:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
  31:     INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum
  32:  
  33: --Query5
  34: SELECT m.*
  35: FROM Table5 m 
  36:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
  37:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum
  38:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum
  39:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
  40:  
  41: --Query6
  42: SELECT m.*
  43: FROM Table3 m 
  44:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum
  45:     INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum
  46:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum
  47:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

Each Query returns 100 rows as expected and using :

   1: SET STATISTICS TIME ON

I get the same result for each query: (I ran each query over 10times):

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(100 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 3 ms.

For some people this might seem strange as they may have been taught that changing the order of your joins can dramatically affect your result.

Now let’s look at the execution plans:

OK so this is all very strange isn’t it!!!

If you run Queries 1 -3 together you get this:

Query 1: Query cost (relative to the batch) 55% 

Query 2: Query cost (relative to the batch) 13%

Query 3: Query cost (relative to the batch) 33%


If you run Queries 4 -6 together you get this:

Query 4: Query cost (relative to the batch) 33% 

Query 5: Query cost (relative to the batch) 33%

Query 6: Query cost (relative to the batch) 33%


The reason the above percentages are different and why the percentages in the Query Plans for each Clustered Index Seek/Scan are different for each query is mainly due to the statistics on the tables. After running the following statement:

   1: UPDATE STATISTICS Table1
   2: UPDATE STATISTICS Table2
   3: UPDATE STATISTICS Table3
   4: UPDATE STATISTICS Table4
   5: UPDATE STATISTICS Table5

The Query Costs relative to the batch for each query became equal and the percentages for each operator in the different Execution plans became the same.

More importantly we notice that the Execution plans themselves are very much the same for each query.

At this stage I would like to point you to Grant Fritchey’s blog about a recent article Execution Plan Estimated Operator Cost for more information on the strange percentages in execution plans.

The most important thing to remember here is to understand what you are writing. Test what people tell you if you not sure, on top of this understand what an execution plan is showing you. Understand the plan don’t blindly believe the percentages …

Tuesday, 24 March 2009

SSMS – Manage your Templates

I’ve noticed that some people are struggling with the setting up of their existing templates in SSMS 2005. This posting with deal with two issues.

  • Locating the file path of your templates.
  • Tidying up the default templates.

Locating the file path of your templates.

There are two locations for templates when working with SSMS 2005:

Local Store:

C:\Documents and Settings\[USERNAME]\Application Data\Microsoft\Microsoft SQL Server\90\Tools\Shell\Templates\Sql

Replace [USERNAME] with you windows login name. This location is where you templates are stored when created in SSMS.

Main Store:

C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\sqlworkbenchprojectitems\Sql

This is where the default file template folders and scripts are stored. All files and folders in this location are copied to your local store when you open SSMS.

VISTA Store ONLY:(Just added this after some research on vista)

C:\Users\[USERNAME]\AppData\Roaming\Microsoft\Microsoft SQL Server\90\Tools\Shell\Templates\Sql

It seems that if you running vista the files in your main store need to in here as well.

Tidying up the default templates.

Step 1:

Create a folder called "2 – Default Templates” in the Main Store. Copy all the folders in the Main store into you new folder and you Main store should look like this: (If you using vista make sure this folder is also in your VISTA Store)

main store

Step 2:

Delete all the folders in your local store (excluding your personal templates). Create a folder called “1 – My Template”, then copy all your templates into this new folder. Your Local Store folder should now look like this:

local store

Step 3:

Close down SSMS. The next time you open SSMS your template structure should look like this:

Templates Image

(ctrl+Alt+T will open your templates)

I hope this helps you organise your templates and makes life easier for you.

Friday, 13 March 2009

Welcome

Hi All,

This is my first post on my first blog.

For now I will be adding reader links to some other developer’s blogs/sites that I think everyone could learn a lot from.

I hope to be adding some of my knowledge and experience to this blog very soon.

I would also like to start by saying that some of the code tips and snippets that I will be posting here may sometimes be borrowed from other people, where this is the cause I will do my best to acknowledge those people for their good work.

Thanks for taking the time to visit my blog.