Re: pervasiveness of surrogate (also called synthetic) keys

Greg Smith <greg@xxxxxxxxxxxxxxx> · Wed, 04 May 2011 03:25:54 -0400

David Johnston wrote:
Is there any rules-of-thumb on the performance of a PK as a function of key length?  I like using varchar based identifiers since I tend to query tables directly and writing where clauses is much easier if you can avoid the joins.  I'm likely better off creating views and querying those but am still curious on any basic thoughts on having a 100+ length primary key.

The shorter the better, but it may not be as bad as you fear.  The way 
B-tree indexes are built, it isn't that expensive to hold a longer key 
so long as the unique part doesn't average out to be that long.  So if 
you insert "123456666666666666666" and "12345777777777777777", that's 
not going to be much different than navigating "123456" and "123457", 
because once you get that far you've already reached a unique prefix.  
But if your entries have a really long common prefix, like 
"111111111111111112" and "111111111111111113", that's going to be more 
expensive to deal with--even though the strings are the same length.

If your identifiers become unique after only a few characters, it may 
not be so bad.  But if they go many characters before you can 
distinguish between any two entries, you're probably not going to be 
happy with the performance or size of the indexes, relative to simple 
integer keys.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general