Use guid.comb in your database if you need guid keys but don't want to take a big performance hit

COMB Guids have been around since 2002.  In short, WinNT used to use the MAC address of a network card to help generate GUIDs, and they were pseudo-sorted.  Since then, Guid have become more random and not dependent on anything.  Because of this, SQL Server 2000+ incurs a significant performance degradation when you use Guids as primary keys.  The degradation can be up to a factor of 10.  The reason is the random nature of the Guid and the clustered index on the primary key (which seeks to keep them in sequence).

We must remember that a Guid is not a string even thought that's how we see them.  A Guid is a 16-byte data structure.  An int is a 4-byte data structure, so we can simply think of it as 4 times as large.  The kicker is that integer keys are sorted and incremental.  Guids are random.  A clustered index can keep them in order for you, but every insert incurs the job of finding where in the list to insert the new value.  Integer keys just go at the end of the list.

If you are using Guids (and NHibernate), check out the guid.comb key generator.  Read Jimmy Nilsson's article on COMB Guids to understand just why they perform better than normal Guids, and then go through the NHibernate documentation to see how you can use them in your mappings.  Ok, I'll just give you a sample:

<class name="MyClass" table="MyTable" dynamic-update="true">
<id column="Id" type="Guid" name="Id">
<generator class="guid.comb" />
</id>

<property name="Something" type="Int16" not-null="true" />
<property name="SomethingElse" type="Byte" not-null="true" />

</class>

Comments

Paul Hadfield said on 6.03.2009 at 3:26 PM

Hi, I've just come across this article and confused as to why you'd create a clustered index on a GUID - primary keys can, and normally should be non-clustered. A clustered index is only ever useful when you're doing a range scan (i.e. BETWEEN x AND y) - you'd never need to do that on a GUID, therefore you'd never need a clustered index.

Tammie said on 8.13.2009 at 7:42 PM

Actually, this totally depends on what type of application you are developing. For one that is highly transactional (lots of inserts and updates), you should always put the clustered index on the primary key field. Why? Because by putting the clustered index on some other field, you risk sacrificing your performance should that field be one that is not sequential and ever increasing. SQL Server will "reshuffle" the entire table on every insert and update. I've seen this happen, and it is a terrible performance killer. On the other hand, if you have a decision making system (reporting, data wherehouse), of course you should put it on a field that will support the most common queries. Also, clustered indexes are great for ranges, but if you are going to retrieve data by a primary key most often, it is best if the data is physically arranged by the primary key, as opposed to being a separate index that has a pointer to that record.

I've seen many systems use guids as primary keys, especially if there is replication involved and/or the object id needs to have some unique external id other than just a number. Therefore, sequential guids make sense if performance is also a requirement.

Anyway, I had a long discussion with a Microsoft presenter who was a SQL Server MVP at a Tech-Ed once and these were the rules of thumb that he presented.

Jeffrey Palermo said on 8.13.2009 at 8:39 PM

@Tammie,

All that is right and valid. These concerns are exactly what the COMB (sequential) guid is used over regular guids.