Re: varchar as primary key

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Fri, 4 May 2007 13:23:38 -0400

On 5/3/07, Alexander Staubo <alex@xxxxxxxxxxxxxxx> wrote:
PostgreSQL uses B-trees for its indexes, insertion time is logarithmic
regardless of the type of the key, but strings have a larger overhead
since they involve character comparisons; (i - j) is a lot faster than
strcmp(i, j). If you do go for strings, I would suggest that the
beginning of the key be statistically distributed as widely as
possible; ie., avoid common prefixes.

I think the performance benefits of i - j over strcmp(i,j) are mostly
irrelevant, locale issues aside.  The main reason why integer keys can
be faster is because the index is smaller and puts less pressure on
cache.  This has to stacked up against the fact you are often hitting
the varchar index anyways for sorting and filtering purposes (swapping
a int for text index is only a guaranteed win if you can drop the text
index completely).  So, by using integers from performance perspective
we are mostly trying to prevent a cache miss (during which time a
computer might perform 100k strcmp operations).  If there is also a
varchar index, and it is used for various queries, it may actually be
faster to drop the integer index altogether because it is competing
with cache resources with the integer index.  Unfortunately, this is
more often the case than not in my experience.

As solid state technologies continue to mature and near zero latency
storage systems become widespread, this advantage will lessen as the
penalty for a cache miss becomes much less.

merlin