SQL 4 life

Thursday, 11 June 2009

Random Numbers

This is going to be a very short post, but I keep seeing it pop up in forums so thought I would put my two cents in.

When trying to generate a random number say between 1 and 5, I’m noticing a lot of people using the follow notation:

        1: CAST(5 * RAND() + 1 as INT)

This method will work if you are looking to obtain a single random number. If however you are selecting from a table and you want to generate a random number for each row the above notation will produce the same number for ever row.

Ok so you might say that all you need to do is change the seed of the RAND() function to get a different value for each row. True however the best solution for this I’m seen so far is a follows:

       1: CAST(5 * RAND(CHECKSUM(NEWID())) + 1 as INT)

Now this does work but seems to be over kill. So…

I came across a method a few years ago on how to generate “true” random numbers in T-SQL and it looks something like this:

       1: ABS(CHECKSUM(NEWID())) % 5 + 1

Now you will note that when using this against a set of data each row will get a random number. Below is an example using all the notations as to help illustrate my point:

       1: SELECT TOP 10

       2:   ABS(CHECKSUM(NEWID())) % 5 + 1 as [True Random],

       3:   CAST(5 * RAND() + 1 as INT) as [RAND Random],

       4:   CAST(5 * RAND(CHECKSUM(NEWID())) + 1 as INT)

       5: FROM sysObjects

Please let me know if you have any other solutions, I’d be interested to get peoples options on this?

Thursday, 26 March 2009

The order of INNER JOIN’s

I want to address a topic that I think a lot of people have been taught in the past and still live by today. I’ve heard different variations of this “rule” in the years that I have been programming and they go something like this.

“Where possible ALWAYS have you smallest tables joined to first.”
“Where possible ALWAYS have you largest tables joined to first”

So already you might start to doubt the validity of the rule :-) GOOD

I’ve created 5 tables all with the same structure.

        1: CREATE TABLE <Table Name>

       2: (

       3:     RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED

       4:     SomeInt INT NULL,

       5:     SomeString CHAR(2) NULL,

       6:     SomeCSV    VARCHAR(80) NULL,

       7:     SomeNumber MONEY NULL,

       8:     SomeDate DATETIME NULL

       9: )

I’ve added different amounts of data to each table as follows:

    --Table1 100 rows

    --Table2 1000 rows

    --Table3 10000 rows

    --Table4 100000 rows

    --Table5 1000000 rows

I then created 6 queries all with various join orders:

       1: --Query1

       2: SELECT m.*

       3: FROM Table5 m

       4:     INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum

       5:     INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

       6:     INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum

       7:     INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

       8:  

       9: --Query2

SELECT m.*

FROM Table5 m

    INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

    INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum

    INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

    INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum

  

--Query3

SELECT m.*

FROM Table5 m

    INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum

    INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum

    INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

    INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

  

--Query4

SELECT m.*

FROM Table1 m 

    INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

    INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum

    INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

    INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum

  

--Query5

SELECT m.*

FROM Table5 m 

    INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

    INNER JOIN Table3 a3 ON m.RowNum = a3.RowNum

    INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

    INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum

  

--Query6

SELECT m.*

FROM Table3 m 

    INNER JOIN Table1 a1 ON m.RowNum = a1.RowNum

    INNER JOIN Table5 a5 ON m.RowNum = a5.RowNum

    INNER JOIN Table4 a4 ON m.RowNum = a4.RowNum

    INNER JOIN Table2 a2 ON m.RowNum = a2.RowNum

Each Query returns 100 rows as expected and using :

       1: SET STATISTICS TIME ON

I get the same result for each query: (I ran each query over 10times):

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(100 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 3 ms.

For some people this might seem strange as they may have been taught that changing the order of your joins can dramatically affect your result.

Now let’s look at the execution plans:

OK so this is all very strange isn’t it!!!

If you run Queries 1 -3 together you get this:

Query 1: Query cost (relative to the batch) 55% 

Query 2: Query cost (relative to the batch) 13% 

Query 3: Query cost (relative to the batch) 33%

If you run Queries 4 -6 together you get this:

Query 4: Query cost (relative to the batch) 33% 

Query 5: Query cost (relative to the batch) 33% 

Query 6: Query cost (relative to the batch) 33%

The reason the above percentages are different and why the percentages in the Query Plans for each Clustered Index Seek/Scan are different for each query is mainly due to the statistics on the tables. After running the following statement:

       1: UPDATE STATISTICS Table1

       2: UPDATE STATISTICS Table2

       3: UPDATE STATISTICS Table3

       4: UPDATE STATISTICS Table4

       5: UPDATE STATISTICS Table5

The Query Costs relative to the batch for each query became equal and the percentages for each operator in the different Execution plans became the same.

More importantly we notice that the Execution plans themselves are very much the same for each query.

At this stage I would like to point you to Grant Fritchey’s blog about a recent article Execution Plan Estimated Operator Cost for more information on the strange percentages in execution plans.

The most important thing to remember here is to understand what you are writing. Test what people tell you if you not sure, on top of this understand what an execution plan is showing you. Understand the plan don’t blindly believe the percentages …

Tuesday, 24 March 2009

SSMS – Manage your Templates

I’ve noticed that some people are struggling with the setting up of their existing templates in SSMS 2005. This posting with deal with two issues.

Locating the file path of your templates.
Tidying up the default templates.

Locating the file path of your templates.

There are two locations for templates when working with SSMS 2005:

Local Store:

C:\Documents and Settings\[USERNAME]\Application Data\Microsoft\Microsoft SQL Server\90\Tools\Shell\Templates\Sql

Replace [USERNAME] with you windows login name. This location is where you templates are stored when created in SSMS.

Main Store:

C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\sqlworkbenchprojectitems\Sql

This is where the default file template folders and scripts are stored. All files and folders in this location are copied to your local store when you open SSMS.

VISTA Store ONLY:(Just added this after some research on vista)

C:\Users\[USERNAME]\AppData\Roaming\Microsoft\Microsoft SQL Server\90\Tools\Shell\Templates\Sql

It seems that if you running vista the files in your main store need to in here as well.

Tidying up the default templates.

Step 1:

Create a folder called "2 – Default Templates” in the Main Store. Copy all the folders in the Main store into you new folder and you Main store should look like this: (If you using vista make sure this folder is also in your VISTA Store)

Step 2:

Delete all the folders in your local store (excluding your personal templates). Create a folder called “1 – My Template”, then copy all your templates into this new folder. Your Local Store folder should now look like this:

Step 3:

Close down SSMS. The next time you open SSMS your template structure should look like this:

(ctrl+Alt+T will open your templates)

I hope this helps you organise your templates and makes life easier for you.

Friday, 13 March 2009

Welcome

Hi All,

This is my first post on my first blog.

For now I will be adding reader links to some other developer’s blogs/sites that I think everyone could learn a lot from.

I hope to be adding some of my knowledge and experience to this blog very soon.

I would also like to start by saying that some of the code tips and snippets that I will be posting here may sometimes be borrowed from other people, where this is the cause I will do my best to acknowledge those people for their good work.

Thanks for taking the time to visit my blog.

SQL 4 life

Thursday, 11 June 2009

Random Numbers

Thursday, 26 March 2009

The order of INNER JOIN’s

Tuesday, 24 March 2009

SSMS – Manage your Templates

Locating the file path of your templates.

Tidying up the default templates.

Friday, 13 March 2009

Welcome

Useful SQL Server Sites

Useful Blogs

My Blog History

About Me

SQL 4 life

Thursday, 11 June 2009

Random Numbers

Thursday, 26 March 2009

The order of INNER JOIN’s

Tuesday, 24 March 2009

SSMS – Manage your Templates

Locating the file path of your templates.

Tidying up the default templates.

Friday, 13 March 2009

Welcome

Useful SQL Server Sites

Useful Blogs

My Blog History

Subscribe To SQL-4-Life

About Me