On 5/3/07, Alexander Staubo <alex@xxxxxxxxxxxxxxx> wrote:
PostgreSQL uses B-trees for its indexes, insertion time is logarithmic regardless of the type of the key, but strings have a larger overhead since they involve character comparisons; (i - j) is a lot faster than strcmp(i, j). If you do go for strings, I would suggest that the beginning of the key be statistically distributed as widely as possible; ie., avoid common prefixes.
I think the performance benefits of i - j over strcmp(i,j) are mostly irrelevant, locale issues aside. The main reason why integer keys can be faster is because the index is smaller and puts less pressure on cache. This has to stacked up against the fact you are often hitting the varchar index anyways for sorting and filtering purposes (swapping a int for text index is only a guaranteed win if you can drop the text index completely). So, by using integers from performance perspective we are mostly trying to prevent a cache miss (during which time a computer might perform 100k strcmp operations). If there is also a varchar index, and it is used for various queries, it may actually be faster to drop the integer index altogether because it is competing with cache resources with the integer index. Unfortunately, this is more often the case than not in my experience. As solid state technologies continue to mature and near zero latency storage systems become widespread, this advantage will lessen as the penalty for a cache miss becomes much less. merlin