Tuesday, March 27, 2012

Advice on indexes

I have a table (detail) with 4 columns and over 30 million rows .
Say the columns are called ColA, ColB, ColC, ColD

ColA and ColB form a foreign key as this table is a 'detail' table for another master table in the database (master). The master table uses these two columns as its primary key.
To speed up joins between the two tables I have created an index over ColA and ColB

I frequently need to join the two tables together and so most of my queries are of the form:

SELECT master.*, detail.ColC, detail.ColD FROM master JOIN detail
ON master.ColA = detail.ColA and master.ColB = detail.ColB
WHERE master.ColA = @.valA
ORDER BY master.ColB, detail.ColC

This could return up to 60000 rows for a specific value of @.valA.
This is usually a sub-query that feeds directly into another query or a temporary table.

However, I also quite often need to filter the details further by ColC so I have queries like

SELECT master.*, detail.ColC, detail.ColD FROM master JOIN detail

ON master.ColA = detail.ColA and master.ColB = detail.ColB

WHERE master.ColA = @.valA AND detail.ColC = @.valC

ORDER BY master.ColB

The results set from this could be just a few hundred rows depending on @.valC

I clearly have a requirement to have an index across ColA, and ColB, presumably as the clustered index.

My question is how to speed up queries that involve ColC.
Do I create another index across ColA, ColB and ColC or just ColC by itself?
Will an index just on ColC make use of the the clustered index on the other columns?
Should I make the clustered index cover ColA, ColB and ColC instead ? etc ..

I am using SQL Express and am approaching the 4GB limit and this table and another one similar to it account for 96% of the database size.
The four columns in the detail table are all integer types and so don't take up much space per row. I am worried that a wide index will significantly add to the storage space per row and therefore significantly reduce the amount of data I can store in the database.

Any advice would be appreciated.

You defintely do not ever want to create one index that is a subset of another as you asked, ie. do not create two indexes on

ColA, ColB

ColA, ColB, ColC

The one on ColA, ColB, ColC does everything that the first one does, so it is all you need.

In your case it sounds like a clustered index on ColA, ColB, ColC should be all you need. Leaving out one column won't make much difference to the size. The size will be affected by the fillfactor and the fragmentation level of the index, so you should rebuild the index from time to time (how often depends on the nature of your updates) with a fillfactor of 100%, although a high fillfactor will increase the possibility that some updates will be slower.

|||Thanks for that.

I have done as you have said and created an index across all three columns and I am happy with both the speed of the queries and the size of the database.sql

No comments:

Post a Comment