Some SQL Server 2005 / 2008 performance considerations.

Version : 2.9
Date : 17/08/2010
By : Albert van der Sel
Type of doc : It's just a few notes on SQL Server performance in simple words. It's no more than just "entry level".
For who : For anyone who likes a short orientation on the subject. But for experienced DBA's, it's too simple.


0. Preface
Chapter 1. How to "quickly" determine if you have Disk I/O problems.
- 1.1 Use of the "fn_virtualfilestat" function
- 1.2 Using "System Monitor"/"Performance Monitor" Disk counters with respect to Disk IO
- 1.3 Information from the "sys.dm_os_wait_stats" view, with respect to IO and Disk IO
Chapter 2. Some remarks on Virtual Machines and a large, active, SQL Server environment.
Chapter 3. Some remarks about a 64 bit OS and 64 bit SQL Server edition.
Chapter 4. Indexes and Index tuning.
- 4.1 Short recap on indexes
- 4.2 Dynamic management views and functions
- 4.3 Move a hotspot index to another Filegroup
- 4.4 Quick check on the effectiveness of your indexes
- 4.5 How to Reorganize and Rebuild Indexes
- 4.6 Remarks on Statistics
Chapter 5. How to determine if you have a CPU problem .
- 5.1 Using "System Monitor"/"Performance Monitor"
- 5.2 Using the Dynamic Management Views and Functions
Chapter 6. Allocation Unit and Partition Alignment.
Chapter 7. Placement of objects on Filegroups.
- 7.1 The "traditional" non-partitioning approach
- 7.2 Partitioning tables and indexes
Chapter 8. Other remarks on several subjects (like tempdb, datatypes etc..)
Chapter 9. Some real world cases, describing somewhat more complex performance problems
Chapter 10. A few links to recommended technical articles

0. Preface

This is a short note, introducing some basic considerations on SQL Server performance.

If you are an experienced SQL Server DBA, this note is probably not for you (but you are certainly invited to read on!).

Instead, if you just know you way around in SQL Server, and you just want to know some of the more important
stuff on performance, then you are at the right place.
But.., remember, this note is surely no more than a lightweight discussion.

And a very important remark should be made right at the start: we are not going to discuss "Query Design and Tuning"

There are many things which are very important to improve SQL server performance. Like for example, to place
Table data and Indexes on different Disk Volumes (different "spindles"), there are memory considerations,
there are "smart" choiches on the datatypes, considerations if you need partitioning etc.. etc..
One very paramount subject in performance considerations (at all Database Engines like Oracle, SQL Server etc..),
is "Query Design & Tuning". Now, this note will not address that subject.

What is meant by Query Design and Tuning ?

Ultimately, Users or applications, send SQL Statements to the database engine.

For the sake of argument, let's consider some complex SQL statement. It could consist of, say,
20 or so nested joins of possibly a mix of small tables, large tables, and possibly very large tables.

Now, there are many roads that lead to the city of Rome, and some of them are easy to travel,
while others are difficult and lengthy. The same is true for how someone designs a query and how it will be executed.
It could be a very smart one, using the best and fastest accesspaths, or it might be so terribly ill-designed,
that the database Server just whished someone pressed the shutdown button.

So, query design and tuning, is extremely important ! Yet, we don't mentioned it in this note.
Ofcourse, one element in Query Tuning will be addressed, and that is determining the best use of indexes.
But, the subject on how to best actually write Queries, is not. And believe me, it is a very important subject.
If some application uses horribly bad designed SQL Statements, it's really very hard to fight that.

Note: you might not even be in control on Query Design at all. Suppose you use a "third-party" application (which is very likely).
Then it just works the way those developers created it. But even then you can "capture" the SQL statements, for example using
a 'tracing tool' (the profiler), or by using some smart queries on the Dynamic Management views and functions.
Then, in principle, you can hold the results to those developers, along with recommendations.
But, yes, that might be a bit of an "optimistic" view on matters.
On the other hand, if you have "internal" developers, you could create code for them in "stored procedures" (works fast),
or make recommendations for their query design

Could this note then has something to say at all?
Definitely! There are so many considerations on SQL Server peformance.

Below, you will find about 8 considerations, in 8 chapters, which are generally viewed to be "relevant"....

Chapter 1. How to "quickly" determine if you have Disk I/O problems.

Many learned people will tell us, that a good performance measurement, should always consider cpu, the disksubsystems,
memory, the network subsystem, and all relevant specific SQL server counters.
That is true. But it does not mean that you can't focus for a while on just one particular subsystem only.
If we do that, the only thing we should beware of, is that we should not jump to conclusions right away. Ok, that's fair enough.

Still, the following gives us very important clues on the status/statistics of Disk IO.
  1. A few "dynamic management views and functions" in SQL Server, can show you quickly whether Disk IO is good or bad, or something in between.
    The cause of poor Disk IO, could originate from a Database design which is not OK (like not having seperated tables and indexes),
    or, indeed the disksubsystem cannot keep up with the demands.
  2. Also, a number of counters of "System Monitor" (or Performance Monitor) in NT, will show you quickly the same thing.

1.1 The "remarkable" fn_virtualfilestats function

One remarkable function, is the fn_virtualfilestats function. You can use it from the "Query Window" (or Query Analyzer)
which you can start from "SQL Server Management Studio".
It returns I/O statistics for database files, including log files. To use it, it can be as easy as this statement:

SELECT * FROM fn_virtualfilestats(null, null)

The function takes two parameters. If you leave them as "null", you will see IO statistics on all files from all databases.
That ofcourse, could already be good enough. Below, you see an example of the output.

Before we discuss this very interresting output, here is some more info on the function itself. If you want to use parameters (for a shorter list), then you should know that the first parameter is the "database id" (dbid), and the second one is the "file id" (fileId).
This way, you can "focus in" to a particular database and/or particular file.

It's easy to get a list of database names and database id's using:

select * from sys.databases

Likewise, it's easy to get a list of filenames and file id's for a particular database, using:

use database_name_of_your_interest
select * from sys.database_files

Now let's turn our attention to the output. As you can see, you find a number of very interesting columns in the resultset.
Here are a few important ones:

BytesRead: Number of bytes read issued on the file.
BytesWritten: Number of bytes written made on the file.

Note that you can easily identify "hotspot files" from those values...(!)

IoStallReadMS: Total amount of time, in milliseconds, that users waited for the read I/Os to complete on the file.
IoStallWriteMS: Total time, in milliseconds, that users waited for the write I/Os to complete on the file.

The last two values, thus represent the "stalls" on reads and writes. Obviously, they should take on small values.
If you see high values, It does not automatically means that you have a "bad" Disksubsystem design.
Ofcourse, that could be true, but also other configurations can contribute to the effect.
For the latter, you might think of large tables and large non-clustered indexes which are stored in the same file,
and thus a lot of contention takes place, which can result in the "stalls" observed.

So, the output of the "fn_virtualfilestats" function, cannot lead to exact conclusions right away.
  1. You can identify files which are "hotspots", thus which have a large number of reads and/or writes.
    This could mean that you must move a (few) "active" table(s) or index(es) to a new file on another disk.
  2. You identify files where "stalls" takes place. This could mean that the Disk IO design is not good enough,
    but that fact is not proven yet. Just as above, here too it could could mean that you must move a (few) "active" table(s) or index(es)
    to a new file on another disk.
But if you have neatly seperated active indexes and tables, and there still are many stalls,
then, it might be a clue that Disk IO is not optimally setup. But, we do not jump to conclusions yet!

In our example, we have seen a Disk IO problem. But it could either be caused by a bad database design, or possibly by not optimally
configured disks.

You see? We already have learned a lot. But we need more information, so let's see what the next section brings us.

1.2 Using "System Monitor"/"Performance Monitor" Disk counters with respect to Disk IO

System Monitor (or Performance Monitor as many people still call it), is a well-know NT performance measurement utility.
You can do "real-time" measurements (viewing real-time graphs), or you can log the findings to a file.
If you want to start it, just open a command window and enter the perfmoncommand.

With NT system monitoring tools, you will encounter the following naming structure;
- "objects" are representations of (real) components like processor, physicalDisk etc..
- a "counter", of an object, is a measurable metric that is exposed by that object. An object usually has many counters.
- "instance", for example, you might have an object like a "PhysicalDisk", but.. you might have multiple disks on your system!
So, in this example, you might pick a particular disk (like E:), or choose all of them (mostly designated by "_Total")

So you might have as an object, a "processor", which exposes several counters like "%User Time", "%Priviledge Time", or
"%Processor Time" (which is User + Priviledge).

Using this graphical tool (performance monitor/system monitor), it's quite easy to select multiple objects, and per object,
select the counters that you are interrested in.
This chapter deals with the subject on how you can discover Disk I/O problems, so we will focus our attention to that subject.

If you want to add counters, just navigate to the "graph" section and right-click. Then you will see a menu to add counters.
There objects (each with many counters) which describe your system and OS, like processor, memory etc..
And, if your Server has SQL Server installed, there are many objects from SQL Server as well.

Standard system Disk counters:

First let's take a look at the standard system counters, related to the Disk subsystem.
Note: maybe you want to check with your sysadmin whether disk counters are "activated" (it's very likely that it is).

As objects, you can choose "PhysicalDisk" and "LogicalDisk". Logical Disks are structures like "partitions", like an E: drive.
In this discussion, it does not matter which one you choose, as long as it is the disk/partition you want log, that is,
it should contain databasefile(s), or the transactionlog file(s).

So, for example you might choose "G:", because this "disk" contains an active Index file.
Or, you might choose the "_Total" object of the LogicalDisk object, which means you measure all disks at the same time.
And if you have a dedicated SQL Server machine, then that's not a bad choiche.

Now, pay special attention to the following counters (from the PhysicalDisk object):

% Disk Time
Avg. Disk Queue Length

In the figure on the left, you see a very "exaggerated" example of a perfmon graph. Note the various counters diplayed. The "yellow" line is "%Disk Time", while the "green" line represents the "Avg. Disk Queue Length".

The values shown here, are a bit high. Who cares, it's just an example.

The "PhysicalDisk: % Disk Time" counter monitors the percentage of time that the disk is busy with read/write activity.

If the "PhysicalDisk: % Disk Time" counter is constantly high, (more than 80,90 percent), we might see a problem.

Even of more interrest, is the "PhysicalDisk: Avg. Disk Queue Length" counter, to see how many system requests are waiting
for disk access. If you see that taking values like 2, 5, 3, 4, 10, 1 etc.., there is likely no Disk IO problem.

If you want to see that expressed in a rule: the number of oustanding request should be no more than 2 x (the number of spindles).
So, suppose you see constantly high values like 20, 30 or higher, you might agree we probably have a serious Disk IO problem.

Also note from the above figure, that I took the counters of the "PhysicalDisk" object, over all "disk instances" (shown as "_Total").

As always, however compelling the "evidence" seems to be, never jump to conclusions right away. We still need more information.

Now, what about the specific SQL Server counters? Sure, and we will see them in Chapter 6!

1.3 Information from the "sys.dm_os_wait_stats" view, with respect to IO and Disk IO

Since version 2005, SQL Server has a very extended dictionary with "system views" and "system functions", for use for the DBA.
The views contains a wealth of statistics on sessions, locks, transactions, system metrics like latches, Disk statistics etc.. etc..
At last, we have reached the level of Oracle DBA's, which for many many years queried their hundreds of v$ and DBA_ views.

There are so many interresting system views in SQL Server, that we want to view them all right now! But, this chapter is
dealing on Disk IO, so one Dynamic Management view stands out: "sys.dm_os_wait_stats".
This view is specifically for gathering "waits" on a large number of events.
(Note: even the function of section 1.1 uses it)
But we must remember that the values in "sys.dm_os_wait_stats", are "cumulative", that is, information is added all the time.
The counters can be reset using the SQL statement DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR).

You could already use "select * from sys.dm_os_wait_stats", but that query gives use too much (unfocused) information.

In the figure on the left, you see a better query variant on sys.dm_os_wait_stat. We specifically leave out fields which we are not interrested in (at the moment) because we want to focus on events related to Disk IO.
That's why we use the clause "where waittype not in (..)".

And, we use the clause "top 10 *" because we are interrested in the "top 10 IO metrics".

Note: below you will find the query in clear text).

Also note that you can write nummerous interresting query variants on sys.dm_os_wait_stats.

Let's now focus on the output of this specific query. In this specific example, the query was run on a system which has quite a Disk IO problem. A very important column to pay attention to is "wait_time_ms", which shows us the time in ms, which processes had to wait for, for that specific "wait_type" (shown in the first column).

What could set us on the track of possible Disk IO problems, is the "wait_time_ms" related to the
"Pageiolatch_%" wait_types.
"Pageiolatch_sh means that a session is waiting for some page to be brought into memory, from database files on disk.
That a small wait is involved is understandable. However, if the Pageiolatch_% waittimes are too high, it could be an indicator of a Disk IO problem.
More generally, this wait_type, represent all sorts of memory-to-disk transfers, so it could also be an indication that you are low
on cache buffer memory.

IO problems can also be be a considerable factor, if ASYNC_IO_COMPLETION and IO_COMPLETION show high wait times.

The values are cumulative ofcourse, and you might consider the above DBCC command, to start over.
What are reasonable values of the Pageiolatch_% values?
The query that is shown in the figure, actually tells you. If it's listed high in the top 10, we have an indicator of a Disk IO problem.

Again, don't jump to conclusions from this information alone. However, if we take all the information from
sections 1.1, 1.2 and 1.3 together, and they provide a consistent view, then it is reasonable to assume that Disk IO
is indeed a problem.
So, we did what was promised in the title of this chapter: How to "quickly" determine if you have Disk I/O problems.

However, we did not make clear whether the IO problem arose from a bad database design (seperation tables and indexes etc..),
or that indeed the disksubsystem is not good enough for the SQL Server environment.

As another example to be carefull before "you blame something", consider this:
Maybe there are so many users active, which was not anticipated before, so that the system is not
scaled properly. There are just so many competing sessions, that everything "overloads".
Ofcourse we can investigate that too.
It's just a remark "to be carefull" with conclusions.

Note: here is the query (for easy copy/paste purposes)

select top 10 *
from sys.dm_os_wait_stats
where wait_type not in
order by wait_time_ms desc

Chapter 2. Some remarks on Virtual Machines and a large, active, SQL Server environment.

In case of a very large, very "active" database, better not virtualize your SQL environment. Well, that's my opinion.

Please read the following with a healthy dosis of skeptism. It's just that on this subject, nobody can tell
you "the absolute truth".

In today's Datacentres, many physical Servers are present, where each physical machine is running many "Virtual Machines (VM)".
Almost always, some "virtualization" product is used by which the Admin is able to create VM's on that Host.
Those VM's, each get their share of memory, virtual cpu's, network access, and disk access.
For example, "VMware ESX" is very popular. One physical Server, installed with ESX, can run a mix of all sorts of Linux and Windows VM's

It does not have to be a problem. But fact is, that many physical Hosts gets "crowded" with competing VM's for resources.
This is the way many "businesses" work: new projects get alive, which all need their own test- development- and production machines.
In many cases, it's hard for a sysadmin to keep all the demands and resources in balance.

Secondly, when the sysadmins and SAN administrators create the infrastructure, they may be unaware of the specific purposes
of the individual VM's that will be installed. Some VM's will work as File Servers, Mail Servers, Domain Controllers, Application Servers,
and.... SQL Server machines.

Now, you might end up with a Win2K3 Server, with just a C:, D: and E: "drive", and you might know nothing
of the specifics of storage.
This is not the way that you will have fun with large, and/or active, databases.
Because, always try to sperate tabledata, indexes, transaction logfiles, (and tempdb) on their own diskvolumes,
that is, seperate "spindels".
Otherwise you might encounter very serious Disk IO problems.

While it is quite easy "to get the evidence" of IO problems (see for example section 1), to "repair" the situation
often is quite difficult.

Don't get me wrong: for smaller SQL Server database, you probably will not have any difficulties at all.
And besides, ESX has all sorts of very neat tricks, like online moving a VM to another host, and much more.

But for a very large, very active database, take care. You might consider a dedicated physical machine.

By the way, the general consensus "in the IT community" just seem to support the above view.
(please note the word "seem" here).
Anyway, at occasions that I talk with Microsoft engineers on this subject, they always say that it's indeed "Not Done".

Note: I certainly like virtualization. If you are on good terms with the sysadmins en storage people, most of the time
things will work out OK. For example, Oracle on AIX lpars, or HPUX vpars, or DB2 on Z lpars etc... etc..,
works (in general) OK.
I must say that I am a bit dissapointed by a couple of bad experiences with really large SQL Server environments
on VMWare. I always seem to have problems with Windows Virtualization and large databases. For me, it does not work well.
I always have Disk IO problems and/or cpu utilizaton problems and/or Memory problems and/or network problems,
where Disk IO is a prominently number "1" cause of low performance.
So, it's certainly not an "absolute truth". I only tell you to be carefull, and try to create good test scenario's.

If you must use a VM (for example, because of a very strict company policy), sit down with the sysadmin
and storage admin, and make your resource requirements very clear (and make them your friends as well).

Chapter 3. In case of a large, very "active" database, always choose a 64 bit OS and 64 bit SQL Server edition.

This is ofcourse quite obvious. Even if we just only consider the memory that a 64 bit
SQL Server can use, compared to a 32 bit edition, the difference is astonishing.

On 32 bit systems, you are limited to 4GB of memory. You could put the socalled AWE feature at work, will would allow
for a maximum of 64GB cache. But that is not computational memory, it's just for caching.

A 64 bit SQL Server, on the other hand, can directly address 1,024 gigabytes of physical memory,
so the amount of data SQL Server can access for cache and computational memory, is thus much, much larger.
It will pay off in almost all actions that SQL Server perfoms.

Ofcourse, you need 64bit hardware to implement the 64 bit OS and 64 bit SQL Server editions.

So, unless you are restricted by hardware or licensing issues, you cannot really speak of a choice here:
For larger and/or very active databases, you should always go for the 64 bit environment.

True, this is all very trivial and obvious. Sorry for that.

Chapter 4. Indexes and Index tuning.

You surely know what indexes are, and why they need to exist, certainly for larger tables.

4.1 Short recap on indexes

Here are a few facts (just take them for granted), which will be explained later in more depth.
For the "traditional" relational SQL Server tables, two types of indexes exist: clustered index and non-clustered index.

  1. A table without a "clustered" index, is called a "heap". Essentially, the table rows are then not ordered by some "key"
    (this key is one or more columns of the table).
    Now, this table could have one or more "non-clustered indexes", but it's still a heap.

  2. If a table has a clustered index, ONLY THEN the rows of the table are ordered by the key of that clustered index. So, actually really
    the rows in the pages are sorted by that key. Thus, a table can only have ONE clustered index, because you can physically "sort"
    the table rows in one way only (and not by another key at the same time: that's not possible).
    Preferrably, the key of the clustered index is a table column of a nummeric datatype, that increases with each row added.
    But, a character based dataype is often used too, like for example a "social security number".

  3. A table may have one or more "non clustered" indexes. A non clustered index, will not order the physical rows
    of a table (like a clustered index does). So, a table might have one clustered index (only one), and multple non-clustered indexes.

  4. A non-clustered index is just an external object, looking like a "mini representation" of the table. It has (leaf)pages containing rows,
    each of which has the choosen key of the table (one or more columns of that table), and a pointer to the corresponding
    tablepage which ofcourse contain all the data. That is one reason why an index speeds up searches in tables.
    Such an index looks logically like a tree: it has a root page containing pointers to an intermediate level with pages,
    where those pages point to the leaf level of pages. Those pages then, contains the actual keys and pointers to the table pages.
    When a query uses such an index, the tree is traversed, and the table row is then quicly found.

  5. When a table has no indexes, a full table scan is done, meaning starting at the first page, all the way down,
    unitl the sought after row is found. That's why, with larger tables, indexes are indispensible.

  6. A clustered index is somewhat "special" (if you happen to be familiar with Oracle, it looks like a IOT).
    What we have said in point 4, largely applies to a clustered index as well.
    Except that in this case, the leaf level are the table pages themselves (!).

    At the moment a clustered index on a table is created, the table rows get sorted (on the choosen key of the index).
    When that's done, the index leaf level, just is the same as the table pages.

    To put it in simple words: A clustered Index is the ordered table itself.
    Note: it's almost true, except for the rootlevel and any optional intermediate level.

  7. Only when a clustered index is defined for a table, the table is not a "heap" anymore. The tablerows get sorted
    on the choosen key.
    By the way, if a "Primary Key" is defined for a table, SQL Server will per default create an associated
    unique (clustered) index. You know probably that a Primary Key column in a table, must have all unique values.
    So, each row in the table is unique with respect to that particular column (or set of columns).
    The above rule, is actually enforced by the unique index. A few lines above, I told you that a clustered index
    will be created. In practice, you almost always will do that. But actually, a "unique" index is already sufficient.

    Don't worry now. There are just two index types (clustered and non-clustered). But an index can be defined as unique, if the choosen key, only takes on unique values. So, even a non-clustered index can be defined as unique.

  8. If a primary key is defined for a table, then you automatically have your (one and only) clustered index as well.
    But you may add aditional non-clustered indexes. If you are certain, that a relevant number of queries
    have in the "where clause", colums that are not in the clustered index key, you probably consider creating
    one or more (a well balanced number) of non-clustered index(es).
    It's a bit of an art. You surely can have a situation where the addition of a non-clustered index helps tremendously.
    But if you have too many (useless) indexes, it works counter productive.
    You know, any Insert, Delete, and Update SQL statement, alters a row (or more rows) in the table, and all associated
    indexes must be updated as well, to reflect the new situation.

All objects in SQL Server have an "Object ID" (object_id). This holds for tables, indexes and other objects.

Once indexes are defined on a table, those indexes have an "index id" (index_id or ind_id) as well.

You know about the "close" relationship between tables and indexes. For example, the leaflevel of pages of a
clustered index, just are the table pages.
There are a quite a few dynamic management views and functions that show the properties of indexes and tables.
If a table has indexes, the following index_id's hold for the table and it's indexes

A: table has no clustered index

index_id=0 :index id of the heap (the table itself)
index_id>1 :index id's of the non-clustered indexes (like 1, 2, 3 etc..)

A: table has a clustered index

index_id=1 :index id of the clustered index (the table itself)
index_id>1 :index id's of the non-clustered indexes

Note: in some older views (from former versions) the "index_id" is listed as "indid".

4.2 Dynamic management views and functions:

The following views and functions will help us a lot in determining many properties of indexes, like the number of rows in the index,
the number of pages, fragmentation level, if it might be a hotspot, or contrary, is not uses much etc.. etc..

This is what you likely want to know in a quick and easy manner:

=> If an index is a "hotspot", then that's good because it's evidently used much.
But, if you have multiple "hotspots" in the same file, then you might consider moving an hotspot to another file.

=> Also, since indexes "mutate" as well, you want to know how fragmented they are, and that might trigger you to
reorganize or even rebuild an index (maintenance of an index).

=> If an index is (almost) not used at all, you might want to remove it.

=>You also want to easily see how many rows an index has. The larger indexes are probably the more important ones,
and if they are fragmented, you can create lists of indexes on which you want to do maintenance.
Besides, then you also know which rebuilds of which indexes will hit performance most during a rebuild.

The following views an functions are important:

  1. sysindexes (view): An old friend from former SQL server versions. It's still available in 2005/2008.
    Contains one row for each index (or heap) with many important properties. Often it is joined with "sysobjects", in order
    to get the associated tablenames as well.

  2. sys.dm_db_index_operational_stats (function): Since 2005 and up. Contains one row for each index,
    and aggregates the number of leaf and non-leaf inserts, updates, deletes along with page latch statistics.

  3. sys.dm_db_index_physical_stats (function): Shows per index the fragmentation and allocation information.

  4. sys.dm_db_index_usage_stats (view): Shows per index, statistics of how frequently an index is accessed,
    as well as how many times it is accessed.

  5. sys.dm_db_missing_index_details (view) contains records of possible indexes that the optimizer might
    be able to take advantage of, but that not exist within the database. So, they might be candidates to create.

4.2.1 The "sysindexes" VIEW::

Let's take a look to a usefull query, using this "old" systemview.
Let's try the following. It shows all tables with all their indexes, with their index_id's and number of rows.
The list is sorted by the number of rows, listing the largest first.


substring(,1,40) AS TABLENAME,
substring(,1,40) AS INDEXNAME,, sysindexes.indid, sysindexes.groupid,sysindexes.rows
FROM sysobjects, sysindexes
ORDER BY sysindexes.rows desc

In the figure below, you can see some example output:

Note: In the actual real output, you will see the systemviews as well.

Actually, it's a neat list. You can see all tables with their associated indexes.
The indexes which are a clustered index, have an "indid" of "1", like "pkbob_price"
Remember that the (leafpages of) clustered indexes are actually the tablepages themselves.

Also note that the table called "VALUE" is a "heap", with an indid of "0". It does not have a clustered index.
Because this table is a heap, it is listed as "null" at the indexname column. But the table has 5 non-clustered indexes,
and thats why you see the tablename repeated.

It's also nice to have a list of the number of rows of the indexes.

So, this query shows you the tablename, with all index names associated with that table,
as well as the object_id of the table, the index_id of those indexes, and the number of rows in all heaps,
clustered indexes and non-clustered indexes.

4.2.2 The "sys.dm_db_index_operational_stats" FUNCTION::

A function is not the same as a view, ofcourse. But, you use them not in a much different way as a view.
In both case you make statements like "SELECT .. FROM [view|function ].
Only, a function often expects "parameters" like in "functionname(parameters), where the parameters might be
an "object_id", "index_id", or a "database_id" etc..

- If you specify all parameters, it usually means you need information of one object only.
- In many cases, parameters may take on "NULL", which means you get information of more objects, or even all objects.

Let's see how we can use the "sys.dm_db_index_operational_stats" function.
A short description of this function is: Contains one row for each index,
and aggregates the number of leaf and non-leaf inserts, updates, deletes, along with page latch statistics.

That's very interesting, because here you are able to extract information whether an index is much used or not.
Syntax of the function:

sys.dm_db_index_operational_stats (
{ database_id | NULL | 0 | DEFAULT }
, { object_id | NULL | 0 | DEFAULT }
, { index_id | 0 | NULL | -1 | DEFAULT }
, { partition_number | NULL | 0 | DEFAULT }

Here is an example query:

SELECT database_id, object_name(object_id), index_id, leaf_insert_count, leaf_delete_count,
nonleaf_delete_count, nonleaf_update_count FROM -- many more columns could have been chosen
sys.dm_db_index_operational_stats(DB_ID('eximius_production_data'), OBJECT_iD('VALUE'),NULL,NULL)

Meaning that from the 'eximius_production_data' database, you want operational stats of all indexes
associated to the 'VALUE' table.

Now, suppose your production database has a database_id of '5' (easy to find with "select database_id, name from sys.databases")
Now suppose you want operational stats of all indexes of all tables, then use this really nice query:

SELECT database_id, object_name(object_id), index_id, leaf_insert_count, leaf_delete_count,
nonleaf_delete_count, nonleaf_update_count FROM
sys.dm_db_index_operational_stats(5, NULL,NULL,NULL)

- Ofcourse, you can extend the query with a clause like "ORDER BY leaf_insert_count" etc..
- Also make sure you try "SELECT * FROM sys.dm_db_index_operational_stats(5, NULL,NULL,NULL)", just to see what
other columns of information can be extracted.

In the above two example queries, we focused on leaf_insert_count, leaf_delete_count etc..
But this function will show you much information on "locks" and "contention" as well !
To find out if a certain index is "used much, or not", you can compare the "count" columns of the indexes.
But there are many "wait" columns too, giving clues to if a certain index is a "hotspot".
For your database, try the function again, but this time also include the following columns:

  1. row_lock_count: Cumulative number of row locks requested.
  2. row_lock_wait_count: Cumulative number of times the Database Engine waited on a row lock.
  3. row_lock_wait_in_ms: Total number of milliseconds the Database Engine waited on a row lock.
  4. page_lock_wait_in_ms: Total number of milliseconds the Database Engine waited on a page lock.

Then make lists of of all indexes, and compare the values among those indexes. This will give
you a good idea about hotspots. If you see multiple indexes with longer waits, it might be an indication
to move one or two to another filegroup.
4.2.3 The "sys.dm_db_index_usage_stats" VIEW:

You should use this view complementary to the function in 4.2.2.
The function from 4.2.2, shows you "waits" and leaf_insert_count, leaf_delete_count etc..
But with this view, will see differentation between user_scans, user_lookups, user_updates, last_user_seek,last_user_scan,last_user_lookup,last_user_update etc..
So, it's really easy to find out if an index is usefull or not. If the "last_" columns only show old date/times,
then obviously, the index is not used. It may even be a candidate to delete it.
But not so fast. Maybe the "statistics" are so lousy, that the optimizer does not consider it.
But if you are sure the "statistics" are up to date, and the "user_" and "last_" show small or old values,
then consider deletion of that index.

Note that this view registers statistics of all indexes of all databases. So, probably you want to use
a "WHERE" clause that specifies your database (id) of interest.
Also, the query might take so time to run in a very large database, so first experiment a bit on testsystems.

Since its a view, use "SELECT * FROM sys.dm_db_index_usage_stats WHERE database_id=".
From that, browse the columns that are interesting, and adjust your query accordingly, possibly with clauses like
"user_updates>n", or "ORDER BY user_updates" etc..

You really need to play with this view a bit, to appreciate the information you can get.

4.2.4 The "sys.dm_db_index_physical_stats" FUNCTION:

You notice the "physical_stats" in the name of this function? Indeed, this function will reveal to us
how index pages are filled with rows, and what their average fragmentation level is.

4.3 How to move a hotspot index to another Filegroup:

  1. Make sure you have scripted your objects, that is, have the create scripts of all objects.

  2. Moving large indexes, cost a lot of performance and will take time. With the Enterprise Edition, theoretically
    you can do it "ONLINE", but, for example, you don't want to move a 700 million row index while the users are busy.

Personally, I think that moving objects like tables and indexes to other tablespaces, is quite easy in Oracle.
Ofcouse, the methods exist in SQL Server too, but here it's somewhat more elaborate.

Anyway, if you move an index from one filegroup to another filegroup, there might be a mean rattlesnake in the grass.
Essentially, moving an existing index, means dropping it and then recreating it.
Now, if this index supports some Constraint, like a Primary key, or PK-FK relations, you must enable them again!
It is easy to forget this, because moving an index "looks" like a single action. It is, as long there are no
constraints involved.
Although, trying to explicitly dropping an index (with the DROP INDEX statement) that supports a Primary key, will be denied
by the Database Engine. But, other statements (like ALTER INDEX) will work.

In reality, before you do anything, make sure you script the database. That means that SQL Server will script
all create statements with respect to tables, indexes, constraints, triggers, etc.. etc..
This way, you just have an ascii file with all create statements, and if you need it, you can easily retrieve
for example, an index create statement etc.. without that you need to "remember" which columns were involved. It's easy to do that: just browse around a bit in SQL Server Management Studio (SSMS).
(Just rightclick your database -> choose Tasks -> choose Generate Scripts)

Note: Many professional systems, have some sort of repository, describing all objects, including the TSQL create statements.

Now, how to move an table and or an index to another filegroup?

Let's review in a few simple examples, how indexes are created on an existing table.

Lets make a simple example table:

CREATE TABLE dbo.Customers
Cust_id int NOT NULL,
Cust_name varchar(20) NOT NULL,
Address varchar(30),
City varchar(20),
Country varchar(20)
ON FG_DATA -- The filegroup FG_DATA for storage of tables

- Clustered:


- non-clustered:

ON FG_INDEX -- The filegroup FG_INDEX for storage of non-clustered indexes

Or the Index Create statement in a slightly generalized form:

CREATE [CLUSTERED | NONCLUSTERED] INDEX ON TABLENAME(column1, column2,...) [ON filegroup_name]

Note that I did not use the "ON FG_INDEX" clause with the clustered index create statement.
I could have done that, but that would imply that I "moved" the table to that filegroup.
Again, the table (in simple words) is actually the same as the clustered index.

So, we can immediately conclude the following:

1. How to "move" a table with a clustered index to another filegroup:

DROP the clustered index. Create the Clustered Index again with Filegroup clause pointing to the right filegroup.

2. How to "move" a clustered index to another filegroup:

Ofcourse, it's the same as above.

You cannot move indexes supporting a unique or primary key constraint, using a DROP statement.
As we will see later on, to move these indexes, we must use the CREATE INDEX statement with the (DROP_EXISTING=ON) option.

3. How to "move" a non-clustered index to another filegroup:

DROP the non-clustered index. Create the non-Clustered Index again with Filegroup clause pointing to the right filegroup.

The points 2 and 3 are great, if you would have the original CREATE statements of those indexes.
But there exists another method too. This is by using the "CREATE INDEX .. (DROP_EXISTING=ON).. ON [filegroup_name]" statement

4. Alternative method for "moving" a clustered or non-clustered index to another filegroup:

As a generic example for using the CREATE INDEX .. (DROP_EXISTING=ON) statement, take a look at the following syntax:

ON TABLE_NAME(column1, column2,...)
ON [filegroup_name]

If the index enforces a PRIMARY KEY or UNIQUE constraint and the index definition (which columns) is not altered in any way,
the index is dropped and re-created, and will preserve the existing constraint.
If you want a list of Primary, Unique, and Foreign key constraints, you might want to play with the following queries:


SELECT substring(object_name(constid), 1, 40) AS FK,
substring(object_name(fkeyid), 1, 40) AS "Referencing Table",
substring(object_name(rkeyid), 1, 40) AS "Referenced Table"
FROM sysreferences
ORDER BY object_name(rkeyid)

4.4 Quick check on the effectiveness of your indexes:

In section 4.2.3, we have touched the "sys.dm_db_index_usage_stats" Dynamic Management View.
This view shows us many interresting facts like "user_scans", "user_lookups" etc.., so it gives us a good idea
on the usage, or the effectivity, of an index.

- In general, if an index has a relatively high number of "reads", compared to the "writes", that is a clue
that the index is used much. So, it's probaby an effective index.
So, if the "total Reads" > "total Writes", it's a good index.

- Also, if an index has a relatively high number of "writes", compared to the "reads", that is a clue
that the index is NOT used much. So, it's probably an ineffective index.
If there are little reads, but many writes, the system is busy updating the index, without that users are using it.
You know that writes to an index means updating it. Reading an index means that a query is using it.
So, if the "total Writes" > "total Reads", the index only spills performance, and it's not a good index.

Here is a handy query that produces a list of the indexes in a Database, including the Reads from, and writes to, the indexes.

SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], AS [Index Name], i.index_id,
user_updates AS [Total Writes], user_seeks + user_scans + user_lookups AS [Total Reads],
user_updates - (user_seeks + user_scans + user_lookups) AS [Difference]
FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)
ON s.[object_id] = i.[object_id]
AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
AND s.database_id = DB_ID()
AND user_updates > (user_seeks + user_scans + user_lookups)
AND i.index_id > 1 -- not the heap or clustered indexes
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC;

Note how this query only uses the "sys.dm_db_index_usage_stats" Dynamic Management View.

4.5 How to Reorganize and Rebuild Indexes:

4.5.1 How to detect Index Fragmentation:

Especially, the dynamic management function "sys.dm_db_index_physical_stats" can be of help in determining
if an index is too much fragmented or not.

The function need a number of parameters, which might be left as "null".
The more parameters you specify (as not being "null"), the more specific the output will be.

In particular, there are three very usefull ways to use the function:

(1) Get the details (a list) of the fragmentation levels of all indexes in a certain Database.
(2) Get the details of the fragmentation levels of all indexes of just one specific Table.
(3) Get the details of the fragmentation level of just one specific index of just one specific Table.

Here are a few examples:

SELECT index_id, avg_fragmentation_in_percent, avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID('SALES'), OBJECT_ID('dbo.EMPLOYEE'),null,null,'DETAILED')

SELECT SUBSTRING(OBJECT_NAME(object_id),1,30) AS NAME, index_id, SUBSTRING(index_type_desc,1,20) AS TYPE,
avg_fragmentation_in_percent, avg_fragment_size_in_pages page_count
FROM sys.dm_db_index_physical_stats(DB_ID(N'SALES'), NULL, NULL, NULL , 'DETAILED')

The first query, shows the fragmentation (column "avg_fragmentation_in_percent") of the indexes of the EMPLOYEE table only.

The second query, shows fragmentation information of all indexes (of all Tables) that exist in the SALES database.

Note that the column "avg_fragmentation_in_percent" in the output, shows you the relevant information.
In section 4.2.4, you can find a real example, with example output, of using this function.

According to Microsoft documents, you should Rebuild indexes when the average fragmentation is higher than, say, 25 to 30%.
You might consider Reorganizing indexes when the average fragmentation is between, say, 15 to 25%.

For the smaller indexes, don't expect high improvements from rebuilding or reorganizing indexes.
For larger indexes, the improvement can be very significant.

4.5.2 How to Reorganize and Rebuild Indexes:

- Reorganizing, defragments only the leaf level of clustered and nonclustered indexes on tables. The same pages are used again.
Since a fill factor can be specified, it's likely that empty pages will result by this "compaction". These are removed, and thus providing additional available disk space.
The fill factor, as the name already implies, specifies "how full" a page should be filled, like for example "70" (70%) or "80" (80%).

- Rebuilding an index actually drops the index and re-creates a new one. Since this action means completely rebuilding a new index,
all levels (leaf level, intermediate levels, root level) are recreated again, and all fragmentation is removed.
In this process, you reclaim disk space, since all pages are build again using the specified "fill factor" setting.

Usually, rebuilding an index is more resource intensive than reorganizing an index.
Reorganizing is automatically done "online", thus while sessions may access the table and index.
Rebuilding can be done online or offline. If done offline, locks will block sessions for the time the index is rebuild.
With the Enterprise Edition, using the "ONLINE=ON" clause, you can rebuild indexes online.
Still, with very large indexes, it's really best to rebuild them during the times of least activity in the Database.

Here are a few examples on how to rebuild or reorganize indexes:

First a warning. You can explicitly DROP an index, and CREATE it again, assuming you had the original CREATE statement.
In most cases, the Database will not execute the statement if the index supports (or "is") a Primary Key or Unique constraint.
But you still must be carefull, in how far the index supports any constraint at all. You have to investigate that first.
So, it's best to NOT to "DROP" and "CREATE" an index, unless you know the details of your constraints.

But, when using the "ALTER INDEX .. REBUILD" and "DBCC DBREINDEX()" commands, you are pretty save.

Thus, you can use two types of commands to Rebuild an Index: the "ALTER INDEX" statement and the "DBCC DBREINDEX()" statement.
The DBCC command is more SQL Server 7/2000 "style", but it's still valid in 2005 and 2008.
Just take a look at the following examples. How to deal with all indexes of all tables in a database, is the subject of the next section.

The examples below are very simplistic. You need to read Books Online (BOL), or search the internet, to find all clauses and options
that you can use with the ALTER INDEX and DBCC REINDEX commands.

ALTER INDEX IDX_Employee_EMPID ON Employee REBUILD -- only rebuild the index "IDX_Employee_EMPID"

ALTER INDEX ALL ON Employee REBUILD -- rebuild all indexes of the EMPLOYEE table

ALTER INDEX IDX_Employee_EMPID ON Employee REBUILD WITH (FILLFACTOR = 80) -- only rebuild the index "IDX_Employee_EMPID" with FILLFACTOR=80

So, "ALTER INDEX ALL Table_Name REBUILD WITH (FILLFACTOR = n)" takes care of all indexes of a certain table.

DBCC DBREINDEX(EMPLOYEE,'',80) -- rebuild all indexes of the EMPLOYEE table with a FILLFACTOR=80

So, "DBCC DBREINDEX(Table_Name,'',Fillfactor)", takes care of all indexes of a certain table.


Maybe this is a difficult question. It's certainly a very interresting question.
If needed, search Books Online (BOL) and/or the Internet for answers.

Suppose a table has a clustered index, and several non-clustered indexes.
What happens to the non-clustered indexes, if you rebuild the clustered index?

4.5.3 How to "dynamically" generate index rebuild statements of all indexes in a Database:

If you want to "dynamically" generate the rebuild statements for all usertables in a Database,
we can use a looping construct, called a cursor.

-- A loop generating DBCC statements:

set nocount on
DECLARE @TableName varchar(255)

SELECT table_name FROM information_schema.tables
WHERE table_type = 'base table'

OPEN TableCursor

FETCH NEXT FROM TableCursor INTO @TableName
SELECT 'DBCC DBREINDEX('+@TableName+','+''''',80)' -- asssuming a fill factor of 80
FETCH NEXT FROM TableCursor INTO @TableName

CLOSE TableCursor


-- A loop generating ALTER INDEX statements:

set nocount on
DECLARE @TableName varchar(255)

SELECT table_name FROM information_schema.tables
WHERE table_type = 'base table'

OPEN TableCursor

FETCH NEXT FROM TableCursor INTO @TableName
FETCH NEXT FROM TableCursor INTO @TableName

CLOSE TableCursor


Chapter 5. How to determine if you have too low CPU power.

Again, just like in chapter 1, System Monitor (or Performance Monitor), and the dynamic management views (dmv's) can learn us a lot.

It's important to differentiate here, between a "real cpu problem", meaning that for the workload,
you just have too little CPU power. But it can be a bit confusing, because it can also be that you only have an "apparent cpu problem",
because of waits of some sort.

Let's first take a look at what we can discover using Performance Monitor (perfmon).

5.1 Taking a quick look Using Perfmon:

This is going to be a bit of a trivial section, I am afraid.
Some obvious counters are "Processor\%Priviledge Time(_Total)", "Processor\%User Time(_Total)", and "Processor\%Idle Time(_Total)".

One other very important counter, is not found under the "Processor" object. Instead, select the "System" object, and select
the "System\Processor Queue Length" counter.

Please see section 1.2 for a short explanation on "objects", "counters", and "instances".
There are ofcourse many other interresting counters like "%DPC Time" etc.., but these are not that "relevant" for "our quick look".

A short description of the above counters is:

- %Idle Time: Shows the %time that the processor(s) was idle during the sample interval.
- %Priviledge Time: Shows the %time that the processor(s) was running Kernel Mode code during the sample interval.
- %User Time: Shows the %time that the processor(s) was running User Mode code (non kernel) during the sample interval.
- %Processor Time: Is almost the same as (%Priviledge Time + %User Time).
- Processor Queue Length: This indicates the number of threads in the processor queue. It may be considered to be "high",
if on average it's much more than 2 x #cpu's in your system.

With all those counters, some occasional "spikes" are no big deal. Only if you see structural high values (except for "Idle Time"),
on average, or for a considerable amount of time (say, during a "batch"), you might have a cpu botlleneck.

1. One very trivial remark is this: if the "Processor:Idle Time(_Total)" is very high most of the time, you do not have
a CPU pressure at all on your system. Or, what is the same, if %Processor Time is low most of the time,
you do not have a CPU pressure on your system.

2. If on average, you system shows about 40, 50, 60 %Processor time, I would say it's nicely at work.
Because, suppose that the cpu's were idle all the time, then that's not good either. It would be a waste of cpu power.

3. Now, what if the cpu's show a %Processor time which hovers around 90% most of the time? That should certainly get our attention.

One of the most important questions following point 3) is, is it really a CPU bottleneck, or are those observations
also caused by "some deeper technical reason"?

4. If the cpu's show a "%Processor time" which is high most of the time and "Processor Queue Length" is considerably large too,
then you have a strong indication of a structural CPU pressure.

Although it all seems "obvious", do no jump to conclusions yet.
I think that after reading the material in chapter 9, we will appreciate that statement somewhat more.

If you use perfmon and measure the above metrics for a representable amount of time, and you indeed find
high values on average, it's quite likely you have a bottleneck "somewhere". And indeed, it's likely to be the cpu's.
But what to think of this example. Suppose you have a Database Server, and a number of application Servers.
When some large batch starts, many application components on the Application Servers, all starts tasks on the
Database Server at the same time, and all keeps running until the batch finishes. Maybe thats why we see such a high cpu pressure.
It could be true that this is "as designed", and we are too low on cpu power. Or something else is not right.
We always need a helicopter view so to speak, on the system as a whole.

So, it's often too difficult to reach well founded conclusions, using Performance Monitor alone.

That's why you also need other information, and at least take a look at the dynamic management views as well.

5.2 Using some dynamic management views:

Again, the DMV "sys.dm_os_wait_stats" can give us valuable pointers to existence of possible "CPU pressure".
When a session is "ready for some work", it will first enter the "runnable queue". The longer this queue is,
the more we may presume that cpu pressure is actually really true, since a longer queue length implies that
the cpu cannot keep up sufficiently with the demands.
As a session does work, but then have to wait on some "resource" (like pages that must be fetched from disk),
it will be put in a "waiters queue", until conditions have arived that makes it runnable again (put in the runnable queue).

The time waiting in the runnable queue for CPU, is shown as "Signal Waits".
The time waiting for a resource is shown as "Resource Waits".

The following query will show you a grand total of the percentage of Signal Waits and Resource waits,
and thus allows us to compare the two.

If the total of "%signal waits" is highest, it's a reasonable pointer to suspect CPU pressure.
If the total of "%Resource waits is highest, it's a reasonable pointer to suspect overall poor IO, or excessive locking behaviour.

Again, to draw conclusions on this alone, is not a good idea ! Also take a look at section 5.1.

Select signal_wait_time_ms=sum(signal_wait_time_ms)
,'%signal (cpu) waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
,resource_wait_time_ms=sum(wait_time_ms - signal_wait_time_ms)
,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
From sys.dm_os_wait_stats

select top 20
st.objectid, st.dbid,
total_worker_time/execution_count AS AverageCPUTime,
CASE statement_end_offset
WHEN -1 THEN st.text
ELSE SUBSTRING(st.text,statement_start_offset/2,statement_end_offset/2)
END AS StatementText
sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st

This section is not ready yet. I am thinking on what to present in chapter 9, probably finish that first,
and then return to this section.
So please continue with the other chapters. Thanks !

Chapter 6. Allocation Unit and Partition Alignment.

If you are involved in a new project, designing an architecture for a large and/or very active database,
the design/implementation of storage for your database is very important.

You always should seperate tabledata (or clustered indexes), and non-clustered indexes,
and the files of the transactionlog (and tempdb also), on different filesystems (E:, F:, G: etc..) on different
diskvolumes (actually, different independent spindels).
This is to ensure performance. At the same time, you need a High Availabily solution for your database files,
like some RAID implementations.
I presume that the upper is know to you. In chapter 7, we are going to deal with that subject.

In this chapter however, we mean something different: here we are going to spend a few words on the "allocation unit"
and "partition alignment". This subject is not too exiting, and don't expect miracles from it.

In the following, we will make a certain abstraction. Be warned though: for your specific types of storage, the following
maybe does not apply at all! That's why you need to talk with people who understand your type of storage.

Allocation Unit:

Disk sectors are almost always 512 bytes in size. The allocation unit, or blocksize on disk (not the SQL server blocksize!)
is determined when you format the disk. Here, you often can choose values like 2K, 4K, 16K, 32K, 64K, and sometimes even larger.
Now, the question may arise: is there an optimal allocation unit for SQL Server?
The fundamental pagesize in SQL server is 8K. Secondly, SQL Server often 'read ahead' in 64K 'chunks' (extends).

Two things here:
This question relates to the RAID implementation you have choosen, like RAID5 or RAID10.
In this case, under the hood, the chosen stripesize is of importance, and determines what you should
should take as a suitable allocation unit.

1. If you use such a RAID implementation, first the choosen "stripe size" is important. Many articles suggest you should
use a "n x 64K" stripe size (like 64K, 128K, 256K).

2. After the RAIDn unit is available and viewable as "a disk" in Windows, you partition and format it. Now, what allocation unit should you choose? This is very a hard question to answer.


Microsoft articles seem to suggest that it's always best to choose an 64K allocation unit.
That might be true. Surely, SQL server is their product, so they should know.

Several other articles studied the performance (in general) using 64K and 128K stripe sets, in combination
with 4K, 8K and 64K allocation units. The results varies a lot, so IMHO, we cannot easily answer the question!
It all depends on storage, type of RAID, stripeset choosen, type of database etc..
In most cases, if we are talking about a very ordered Dataware House, with almost only reads and
very neat rebuild tables/indexes, I would also choose 64K allocation unit.

So? I am afraid I was not able to fully answer the question of the "best allocation unit".
Hopefully you agree that's simply not easy to answer unless you know a lot of details.
But generally speaking, I would also say that an allocation unit of 64K seem right, since SQL server
for table and index data 'thinks' in 64K chunks (extents) anyway.

Partition Alignment:

Many articles which discuss "partition alignment", speak of a possible "overall" performance increase of about 15%-30%.
So, if you do this "right", it could have a significant effect.

This work is or should typically be done by a storage admin, or sysadmin, who installs a Server and implements the disksubsystem.

Maybe first you should check with those people, to get the facts (for your specific storage) straight.
In the unlikely case they answer with "huh..?", then you have to provide the neccessary input.

Partition alignment, (or volume alignment, or sector alignment as it is called occasionally) has "something to do"
that the offset of the start of the partition, from the very beginning of the disk, should align with the stripset
that is in effect on your RAID system.
If this not the case, more IO is done than is strictly neccessary.
Now, the point is, the Vendor of your diskarry, or the person who installs it, should know these "magic" numbers.
In practice however, if the information is not available, there are generic setups that seem to work.

If you setup partitions under Win2K8, you will not be "bothered" by the misaligned offset. This system will take care of it.
But, on Win2K3, it could be an issue.

Anyway, if on Win2K3, a 64K offset (128 sectors) is a common value that works on many storage arrays.
Win2K8 uses a 1024K offset, which should work even better.
So, we will take that as our preffered offset.

How to do it yourself? The tools you could use are "diskpar" or "diskpart".
However, DiskPart.exe is the preferred method since it's newer and is included as of Win2K3 sp1.

DISKPART> list disk

shows a list of your disks...

Now, suppose you want to align and then format disk 3:
DISKPART> select disk 3
Disk 3 is now the selected disk.
DISKPART> create partition primary align=1024
DiskPart succeeded in creating the specified partition.
DISKPART> assign letter=F
DiskPart successfully assigned the drive letter or mount point.
DISKPART> format fs=ntfs unit=64K label="SQLINDEX"

Note that in the above example, I choose for align=1024, which is 1MB or 2048 sectors.
It's very likely that this boundary will match most stripe units.

After the formatting is done, you can check the offset using the "wmic" command, like so:

C:\> wmic partition get BlockSize, StartingOffset, Name, Index

There are so many more considerations on storage. For example, a feature like "HBA queue length" is important as well.
So, in general, I would say that DBA's just need storage specialists for large projects.

But at least it's good to know that the right choices on "allocation unit" and "partition alignment" could play
"some" role as well in "overall" performance.

Chapter 7. Placement of objects on Filegroups.

7.1 The "traditional" non-partitioning approach:

If you are not too familiar with the concept of "filegroups", here is a small demo.
In the example below, we are going to create the database "SALES", and instead of just having
only the PRIMARY filegroup (the default)
, we make two additional "logical containers": SALESDATA01 and SALESINDEX01.

A filegroup can contain one or more files (usually more than one, if we have a large database).
The "trick" is, to let the files of a certain filegroup, reside on another filesystem,
than the files of the other filegroups.

If those filesystems correspond to really different diskvolumes, we can achieve parallel IO.
This is so, because when you create a Table or an Index, you can specify (as a clause), on which
filegroup it is supposed to live.
Or, with existing tables and indexes, it's possible to move tables, and non-clustererd indexes, to their own filegroups.

So, filegroups make it possible to seperate tables and indexes, and nothing prevents you from further
seperate large tables (or indexes) on their own filegroups (just create the appropriate number of filegroups).

create database SALES
, size=400MB,
filegrowth= 0MB,
maxsize= 400MB
size= 4000MB,
filegrowth= 100MB,
maxsize= 8000MB
size= 4000MB,
filegrowth= 100MB,
maxsize= 8000MB
size= 3000MB,
filegrowth= 100MB,
maxsize= 8000MB


In the above example, you see that I only have F:, G:, and H: for filegroups for tables and indexes.
It's very important to put the "transactionlog file(s)" seperate from the above filesystems for tables and indexes.
In the above example, the transactionlog resides on I:

Approach 1:

As you have seen from chapter 4, if a table has a clustered index, in effect, the leafpages of that clustered index
are the tablepages theselves.
So, if all (or most) of your tables have a Primary Key (and are enforced by the unique clustered index), then you could follow this approach:

Create nonclustered indexes on a filegroup other than the filegroup of the table (= clustered index).

And repeat that approach for all other relevant large and/or active tables.
Ofcourse, you cannot give every non-clustered index it's own filegroup, so probably you will put a fairly large number
of indexes on "filegroupA", and possibly another fairly large number of indexes on "filegroupB".

This is all actually no more than "common sense".

You can even apply this approach for the largest and most active tables. Even if your database has hundreds, or even thousents
of tables, I am sure, that using the queries from chapter 4, you will discover that only 10, or 20 or maybe 30, really large
and/or active tables are present. Suppose you have found tables A,B,C,D,E,F,G,H to be very large.
Now, nothing prevents you from placing Tables A, B, C, D on filegroup "FG_ABCD", and place the tables E, F, G, H on filegroup "FG_EFGH".

As a small bonus, having the non-clustered indexes on their seperate filegroup(s), you have some level of additional protection.
If the drive(s) containing those filegroup(s) go bad, or something else crashes, you can regenerated the non-clustered indexes again.
Only the tables (or the clustered indexes) contain the "true" data. All other indexes can be re-created, although it could
take quite some time.

Note: always have a recent database script with create statements of all tables, indexes and all other objects !

Approach 2:

Approach number 1, is most appealing to me. But some DBA's do something else. They just mix tables and non-clustered indexes
among a number of filegroups on different diskdrives. That's not a bad idea either.

It's evident, that here, you are most sure that all diskdrives are used at all times.

You only need to create a list of the largest and/or most active tables and indexes, and distribute those object
on several filegroups (which themselves should reside on different "spindels").

If you want to see which clustered and non_clustered indexes reside on which filegroups, you might use this query:

SELECT O.Name AS [Object Name], O.[Type], AS [Index name], I.Index_Id, I.type_desc AS [Index Type], AS [Filegroup Name]
FROM sys.indexes I INNER JOIN sys.filegroups F
ON I.data_space_id = F.data_space_id
INNER JOIN sys.objects O ON I.[object_id] = O.[object_id]

If you do not want to 'see' the systemviews (starting their name often with 'sys') in the output,
then use:

SELECT O.Name AS [Object Name], O.[Type], AS [Index name], I.Index_Id, I.type_desc AS [Index Type], AS [Filegroup Name]
FROM sys.indexes I INNER JOIN sys.filegroups F
ON I.data_space_id = F.data_space_id
INNER JOIN sys.objects O ON I.[object_id] = O.[object_id]
WHERE O.Name not like 'sys%'

7.2 Partitioning large tables and indexes:

Here is some extremely short, and lightweight, information on table and index partitioning.

In this section, we go one step further, by "partitioning" subsets of the same one table, on their own filegroups.

Now let that sink in: we will devide a table, into parts, and put those parts on different filegroups. Essentially,
that's partitioning !

Ideally, you would have a certain column in your table, that lends itself easily to "get partioned". In other words,
that particular column would have values that are easily divided into subsets, like for example "years".
For example, you might have a table with some date column, and you could group records belonging to
to the years "2000", "2001, "2002" etc..

But even if it doesn't go in such a "natural way", you can always force some form of subsetting. For example, you have
a nummeric column, and you just are going to distinquish the following subsets:

1 - 10000000
10000001 - 20000000
20000001 - 30000000

Creating a partitioned Table, or Index, is a 3 step process.

Don't forget that the whole idea behind partitioning a table is this: create Ranges of values, where the rows of the table will
fall into, and make sure that you can store those different record subsets (the different Ranges), onto separate Filegroups.

- We start by defining a Partition Function. This is function that defines the boundaries, or Partition Ranges, that the subsets of rows will use.
- Secondly, we create a Partitioning Scheme, that defines the mappings of the Partition Ranges to FileGroups (individual storage structures)
- Thirdly, we create a Table, using the definitions above.

That's all.

So, a simple example will illustrate this.

1. Suppose we have a certain database, which uses the filegroups FG1, FG2, FG3 and FG4.

Suppose we have a table PARTSAMPLE that we want to partition. It uses the columns ID (datatype INT) and NAME (varchar(20)).
The values that ID can take, are for example 1…1000… 7000… 20000… 30000 etc..

2. Now let's define the "Partition Ranges", or boundaries, that the subsets of rows can take:
We do that by creating a Partition Function, whereas later we are going to "bind" it somehow to the table.

RANGE LEFT FOR VALUES (10000, 20000, 30000)

This function, is an independent object in the database. Later we will use it in the PARTSAMPLE table definition.

The "LEFT" or "RIGHT" in the function definition means if you want the interval to "leftsided" or "rightsided" as in:
10001 - 20000
10000 - 19999

3. Next, we will define the "Partition Scheme", that will relate the Ranges to the Filegroups:

PARTITION samplefunction TO
([FG1], [FG2], [FG3], [FG4])

You see that in the Partition Scheme definition, we relate the FileGroups to the "samplefunction" function, thus thereby
relating the FileGroups to the Ranges (or boundaries).

4. As the last step, we will define our PARTSAMPLE table, using the Partition Scheme defined above.

ON samplescheme(ID)

So, if a record with an ID of 15000 would be inserted, it would be stored on FileGroup "FG2".
Likewise, if a record with an ID of 25000 would be inserted, it would be stored on FileGroup "FG3".

Chapter 8. Other remarks on several subjects.

In this chapter, we will review a couple of other important subjects.

8.1 The right choice of datatypes.

SQL Server knows a huge number of datatypes. If you define a table, you tell SQL Server the columnnames,
and the datatypes (and possibly other attributes as well).
For example, we have the datatypes "int" (integer), "decimal(n,m)", "char(n)", "varchar(n)", "datetime" etc..

If you create the tables yourself for some application, or you are able to advise a developer, the following is
quite important.

If you choose "too wide" datatypes, you fill a database block "too quickly". This means that a smaller number of rows
fits in a table page.
This cost performance, because the database needs to read more pages, to get the same imformation.
For example, suppose some table has a "COMMENT" column. Think of the consequence if some developer choose a
a datatype of "char(2000)", meaning a fixed column lenght of 2000 bytes.
Now if the comment is at most just a few words, then that's a true waste.

So, always try to choose a datatype (with a "width") that suits the purpose, but also not "that thight".
Just try to balance it a bit.

The choice a datatypes also infuences the amount of "processor time".
If you think about a very active database, the first thing that comes to mind, is heavy Disk IO.
But certainly not in all cases. High cpu utilization is a very important performance issue too.
Now, dont think that this will only happen at the physics department, where difficult calculations are used.

Don't underestimate financial applications, where quite elaborate calculations may be done on all sorts of assets
and securities etc..
Here too, the choice of the most optimal datatypes is important. It's difficult to say in general which dataypes
are the best. Just depends on the application. Hopefully, the developers are aware of this fact. Not all are, so I have noticed.

8.2 Some remarks on TEMPDB.

TEMPDB is a sort of "scratch workplace" for SQL Server. You might find many temporay tables here, which might be
used for all kinds of sorts, or as a sort of intermediate storage containers, during processing.

A bottleneck in tempdb IO, can impact the overall throughput of your SQL Server.

How active TEMPDB will be, depends on SQL Server itself, and the application.
Usually, the application is the most important factor. But, if for example, SQL Server "versioning" is in use,
then usage on TEMPDB can be very high.

Fortunately, Windows System Monitor (called Performance Monitor in the past) has the neccessary counters
to make a judgement on how active TEMPD actually is.

You can also write your own system queries that gives info on the activity on TEMPDB.

Here are a few examples:

The following will give an idea on allocated pages per session:


SELECT top 10 *
FROM sys.dm_db_session_space_usage
ORDER BY (user_objects_alloc_page_count + internal_objects_alloc_page_count) DESC

The following will give an idea on the number of temporary tables (which name starts with "#")


select name, create_date, modify_date from sys.all_objects
where name like '%#%'

A keypoint is this: if TEMPDB is very active, here too you might coinsider to place the TEMPDB data- and log files
on a fast disk which was not much used before.
Although many of us don't have that luxury, you might consider to place the files at for example a backup dump
disk, which might only be used "occasionally" (once a day, or once an hour, for full- or transaction backups).
Ofcourse, in some cases, that's not a good idea. Only you can determine how matters are in your situation.

A recommendation you can find in many articles, is to let TEMPDB consist of several datafiles.
Some even recommend, that you let the number of files to be equal to the number of cpu cores.

Here is a sample on how you could modify the TEMPDB database:

(NAME = tempdev,
SIZE = 10MB, -- keep this one small

ALTER DATABASE TEMPDB ADD FILE ( -- let's add 4 datafiles
NAME = TempDevA,
FILENAME = 'M:\mssql\data\tempdeva.ndf',
SIZE = 100MB,
NAME = TempDevB,
FILENAME = 'M:\mssql\data\tempdevb.ndf',
SIZE = 100MB,
NAME = TempDevC,
FILENAME = 'M:\mssql\data\tempdevc.ndf',
SIZE = 100MB,
NAME = TempDevD,
FILENAME = 'M:\mssql\data\tempdevd.ndf',
SIZE = 100MB,

8.3 Carefully consider the Max memory limit for SQL Server 64bit.

Suppose you use SQL Server 2005/2008 64. What should be the maximum memory you should assign to SQL Server?
Actually, this looks easy, but it isn't.

Suppose your x64 system has only 16GB of memory. Now, NT (meaning Win2k3, Win2k8 etc..) and all sysadmin programs
(like backup software, anti-virus, monitoring agents etc..) needs memory too, which can be considerable.
Because... you don't want that the system exhibits excessive paging.
There is no good alternative other than to discuss this with your nearest senior sysadmin.
So, in the upper example of a system of only 16GB, I suggest you limit it to 12GB.

Just a few words on NT paging.

Using Performance Monitor counters to "measure" the amount of paging, can sometimes be a bit confusing.
If you see a moderate or high value of "Memory: Pages/sec", it does not neccesarily need all to be caused by paging alone,
or pages from and to cache.
Some applications or tools might "page" to memory mapped files, which will "distort" the view considerably.
Besides that, many causes exist why "Memory: Pages/sec" produce high values. But anyway, it should not be too high ofcourse.

You might have high amount of paging, if you see:
"Memory: Pages/sec" - moderate to high.
"Memory: Available Bytes" - would be low.
"Paging File: % Usage" - would be high.
"Paging File: % Peak Usage" - would be high.
"Memory: Pages Output/Sec" - would be high.

This last counter shows how many virtual memory pages were written to the pagefile to free RAM page frames,
and is therefore a good indicator.

It's true that NT will always page to a certain extent.
Especially, with all monitoring "tools" loaded, NT systems will certainly page somewhat.

As to the size of the pagefile, as a rule of thumb for "low" memory systems, it should be around 1.5xTotal RAM.
Ofcourse, the more memory you have, the less important the size of the pagefile is.
No sysadmin will ever create a pagefile of 128GB, if the system has 64GB RAM.

8.4 Commit limits and remote logging of Batches.

What I have seen occasionally, are two sort of "misconfigurations" that can affect performance in a negative way.

Most applications, will have some sort of batch facility, which is likely to be scheduled after working hours.
The architecture of such batches, is very diverse.

What can be worthwile, is to investigate if some "component" does "logging" (to a logfile) at a remote host.
If in such a case, a sort of "SEND - Acknowledge" scheme is in use, that can slow down the batch significantly.

Also, it can be worthwile if the "metadata" (in configfiles, or in the registry etc..) of the application, uses
some sort of "commit limit". What I mean is this. Such an application generates smaller or larger batches of transactions.
Each of these batches is comitted. But what is the lenght? Is such a batch 1, 10, 1000, 10000 records "wide"? In the extreme, if a "batch" is only 1 "wide", that too can slow down the batch significantly.

Chapter 9. Some real world cases, describing somewhat more complex performance problems.

9.1 Some important (not so trivial) wait_types:

In this note, we have seen a few subjects that might be the origin for performance problems.
As said right from the start of this note, we do not cover Query Design.
Actually, this is one of the most common causes of low performance. Indeed, the impact can be quite severe.

Now suppose you have, what seems to be, a well configured system. So, you have sufficient memory, and Disk IO seems to be fine,
lots of cpu power etc.. etc..

Still you might encounter what appears to be "performance issues".

Here are a few important wait_types to watch for:

CXPACKET: Probably related to "Parallelism".
SOS_SCHEDULER_YIELD: Might be an indicator of CPU pressure.
ASYNC_NETWORK_IO: Might be an indicator of poor Network I/O.
LCK_X, LCK_M_U, & LCK_M_X: Long lasting locks, and/or locks that involve many extents (leading to blocking).
PAGEIOLATCH_X: Buffer I/O latch, maybe due to poor Disk IO, or low on memory.
ASYNC_IO_COMPLETION & IO_COMPLETION: Could also be attributed to poor Disk IO, or general IO issues.
PAGELATCH_X: Buffer latch problems.
WRITELOG & LOGBUFFER: Might be due to poor IO to the Tranaction log disk subsystem.

In the next sections we will some examples of these wait_types.


Problem: you see high "wait times" associated with one or more of the "wait_types" LCK_M_U, CXPACKET, SOS_SCHEDULER_YIELD,
and possibly also DTC. Secondly, you might (but not neccessarily so) find a lot of dealocks in some application log, and/or users complains
that their frontend applications often "stalls".

We know that "under the hood" systems like SQL Server, are incredably complex. But, SQL Server 2005/2008 generates
a lot of tracing or logging "material", which you can "see" using the "dynamic mamagement views (DMV's)".
Although you may not call it "tracing material", the DMV's sure functions that way.

The "keywords" in the title of this section, look pretty impressive. Actually, these are "wait_types".
If you switch back quicly to section 1.3, you find a nice query giving you a top 10 "wait times"
and telling you what "wait types" they are associated with.
Note from the figure in section 1.3, that the "waittimes" due to "wait_type=CXPACKET" are pretty high too.
The wait_types CXPACKET, SOS_SCHEDULER_YIELD, LCK_M_U are in many cases, related to each other.


CXPACKET waits may indicate parallelism problems. This basically means you are running a parallel process and one
or more threads of it, are waiting for others to complete. CXPackets occur when a query has its operations run in parallel,
but not all operations complete at the same time. SQL Server cannot continue to the next SQL statement
because not all operations have completed. This results in waiting defined as CXPacket.
The setting that determines if "parallelism" is "switched on", is "Maximum Degree of Parallelism" or MAXDOP.
If you rightclick the server object in SQL Server Management Studio, you can view and modify all sorts of settings.
In the "Advanced" page, you will find the "Maximum Degree of Parallelism" setting. If you put it to "1"
you have it switched "off", although TSQL statements can override it with the MAXDOP clause.
Ofcourse, different sessions all operate, and stay to operate, in "parallel", but the parallelism of one statement is off.
If the MAXDOP setting was, for example, "8", you might lower the value, and see what the result is.
Unfortunately, tackling CXPACKET waits usually means a lot of testing (trial and error)


Actually, there are many "LCK_M*" wait-types, like:

Lck_M_U is associated to update locks, that might cause other processes to wait, and
Lck_M_X is associated to exclusive locks, which too might cause other processes to wait.

The LCK_M_U wait_type might be found in many cases. It occurs when a task is waiting to acquire an Update lock, which it can't
do at that time, because another process already has aquired a lock of some kind.

-> Possible causes and some guidelines to what to do:

A possible cause to high values of CXPACKET and LCK_M_U might be threads of an application that are frequently waiting on each other.
In this case, it might be due to ineffective "application design".

If you also see considerable "DTC" waits. them that might be a clue that distributed transactions are not
effectively doing their work.
In this case too, it might be due to ineffective "application design".
Knowing the internals of MSDTC and distributed transactions and application design, is a complex world of it's own.

If you have a system with many cpu's, it could even be counterproductive at some batches or statements which are allowed
to use parallelism. It might thus help to reduce the MAXDOP setting, as explained above.
Ofcourse, having many cpu's is very good for overall throughput, and independent sessions are then running
in parallel.
But, with some particular queries, they might use parallelism that might give rise to LCK_M* and CXPACKET waittimes.

Sometimes, using the SNAPSHOT Isolation Level with the option "ALLOW_SNAPSHOT_ISOLATION" set to ON at the database level,
might improve performance.

9.3 Isolation Levels.

Chapter 10: A few good links to technical articless:

-> If you want more depth on SQL Server performance discussions, here is a good Microsoft article:

Microsoft Technet article

-> Great link with good queries:

-> If you want more depth on SQL Server disk storage design considerations, here is a good link:

Microsoft Technet article

-> If you want more depth on the specifics on TEMPDB, here is a good link:

Microsoft Technet article

-> Some general performance papers:

Top 10 SQL Server performance issues in DW
Top 10 SQL Server performance issues in OLTP

Note: There are some other docs from myself too (excel files):

Listing of some SQL Server 2005 facts & structures (for exam 070-431, 5.5 MB)
Listing of some SQL Server 2008 facts & structures (for exam 070-432, 9 MB)

That's it. Hope it was of any help.