Re: pervasiveness of surrogate (also called synthetic) keys

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 28 Apr 2011 15:39:19 -0500

On Thu, Apr 28, 2011 at 12:29 PM, Jim Irrer <irrer@xxxxxxxxx> wrote:
> A colleague of mine insists that using surrogate keys is the
> common practice by an overwhelming margin in relational databases and
> that they are used in 99 percent of large installations.  I agree that many
> situations benefit from them, but are they really as pervasive
> as he claims?

They are fairly pervasive, and increasingly so, which I find to be
really unfortunate.  Personally I think rote use of surrogate keys is
terrible and leads to bad table designs, especially if you don't
identify the true natural key with, say, a unique constraint.  This
pushes duplicate enforcement out of the database and into application
code, or (even worse) the user.  What everyone misses in the database
is that natural keys force good database design...if you can't
discover one, you probably have a design problem.   There are of
course exceptions, but they are limited.

The main/best case for surrogates is defense against cascading updates
-- obviously if a single record change can turn into 1000's of updates
statements you absolutely want to avoid that if you can.  But in a
typical database this only applies to particular tables.  'One off'
updates are of course no problem.  In a 'infinite performance' world,
this would still be an update, because of locks.

The performance case either way is complex.  Generally, I find natural
keys to be faster all things considered but this is going to be
situationally dependent.  Although integer indexes are tighter and
faster, natural keys give back by skipping entire joins, sorts, etc.
They also tend to have better tuple/page efficiency so that when
searching for a range of records you touch less pages.  Essentially,
you get more traffic routed through a smaller number, albeit bigger,
indexes.  Natural key tables also tend to be 'self documenting' which
is a nice property.

A lot of bad arguments made against natural keys are made, for example:
*) natural keys can change (so what? unless you are worried about cascades)
*) SSN are reused! (SSN is obviously not a complete key if you want to
identify a person)
*) most tables don't have unique natural keys (let's see em)
etc

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general