Re: Basic Q on superfluous primary keys

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Mon, 16 Apr 2007 09:03:42 -0400

On 4/14/07, Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
In response to "Kynn Jones" <kynnjo@xxxxxxxxx>:
> The two situations are semantically identical: each record in table bar
> refers to a record in table foo.  The difference is that in the first
> schema, this referencing is done through an "artificial" serial-integer
> primary key, while in the second schema this reference is done through a
> data field that happens to be unique and not null, so it can serve as
> primary key.

I had a discussion about this recently on the Drupal mailing lists, at the
end of which I promised to do some benchmarking to determine whether or
not text keys really do hurt performance of indexes.  Unfortunately, I
still haven't followed through on that promise -- maybe I'll get to it
tomorrow.

The main reason why integer indexes are faster than natural
counterparts is that the index is smaller and puts less pressure on
cache.  This is however offset by removing joins here and there and
you usually just end up indexing the data anyways.  Performance is
kind of tangential to the argument though -- I've seen databases using
all natural keys and found them to be very clean and performant.

Using surrogate keys is dangerous and can lead to very bad design
habits that are unfortunately so prevalent in the software industry
they are virtually taught in schools.  Many software frameworks assume
you use them and refuse to work without them (avoid!)  While there is
nothing wrong with them in principle (you are exchanging one key for
another as a performance optimization), they make it all too easy to
create denormalized designs and tables with no real identifying
criteria, etc, and the resultant stinky queries to put it all back
together again, (full of unions, self joins, extraneous groups, case
statements, etc).

A good compromise in your designs is to identify your natural key but
use the surrogate if you have valid performance reasons:

CREATE TABLE foo (
 frobnitz_id int unique,
 frobnitz character(varying 100) PRIMARY KEY\
[...]
);

frobnitz_id is of course optional and not necessary in all tables.  It
may be a pain to relate a large table with a four or five part key and
judicious use of surrogates may be justified for performance or even
just to keep your queries smaller:

create table order_line_item_discount
(
 company_name text,
 order_no int,
 line_item_seq_no int,
 discount_code text,
 primary key(company_name, order_no, line_item_seq_no, discount_code)
)

becomes

create table order_line_item_discount
(
 order_line_item_id int,
 discount_code text,
 primary key (order_line_item_id, discount_code)
)

merlin