Re: Database migration and redesign

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Mon, 11 Sep 2006 13:08:10 -0400

On 9/11/06, Brandon Aiken <BAiken@xxxxxxxxxxxxxxx> wrote:
My question relates to primary keys.  The vast majority of tables have a
primary key on a single char or varchar field.  Is it considered better
practice to create a serial type id key to use as the primary key for the
table, and then create a unique index on the char and varchar fields?
Should foreign keys reference the new primary id or the old unique key?
What about compound [primary] keys?

This is a somewhat controversial topic.  I personally am not a big fan
of Ambler or his approach to database design.  This is more or less
the surrogate/natural key debate which is a perennial flamewar in the
database world.

I tend to think natural keys are generally superior but surrogates are
useful in some situations.  Automatic use of ID column in every table
leads to lazy thinking and overcomplicated designs.  However at times
they can really optimize your database.  So I will lay down some
extremes and leave it up to you to find the middle ground.

A trivial case of when not to use a serial key is like this:
create table sex(id serial, sex char(1));
insert into sex(sex) select 'M' union all select 'F';

The reason is obvious, adding a join for no reason whenever you need
to know the sex, albeit a simple one, and expanding the 1 character
type to an int type.

A somewhat more interesting case is:
create table email(id serial, email text);

In this case, while it may seem like a waste to store the full email
in every table that references the email, you are optimizing the join
out in such cases, which can be a big win and since there is no other
properties of the email the email table only serves the purpose of
maintaining relational integrity, iow no duplicates.  In the event the
email changes, we allow RI to cleanup the other tables...an integer
proxy would be (at least to me) an meaningless abstraction of the
email.

A case of when not to use a natural key for relating is a bit more
complex, some times you  just get sick and tired of writing the key
fields over and over, or you have measured and determined the natural
key to be wasetful in terms of index performance.  Another reason is
if the p-key data changes frequently and the RI mechism is too
expensive.  In this case I would advise you to strictly key on the
natural and make a candidate, serial key.

create table foo
(
 nat_key1 text,
 nat_key2 int,
 foo_id serial,
 [...]
 primary key(nat_key1, nat_key2),
 unique(foo_id)  -- candidate unique key
);

This is my middle ground: there highly situational cases where an id
column is a win on pracital reasons.  however, a strong design around
natural keys tends to make you think the problem through much more
carefully and lead to a tighter database.

Also, any suggestions for good DB design books would be appreciated.  I no
longer have any good DB design books, and I don't know what's good anymore.

http://www.amazon.com/exec/obidos/ASIN/0201485559/databasede095-20?creative=327641&camp=14573&adid=07TEH0J3FS9SYN309QMS&link_code=as1

merlin