On 9/11/06, Brandon Aiken <BAiken@xxxxxxxxxxxxxxx> wrote:
My question relates to primary keys. The vast majority of tables have a primary key on a single char or varchar field. Is it considered better practice to create a serial type id key to use as the primary key for the table, and then create a unique index on the char and varchar fields? Should foreign keys reference the new primary id or the old unique key? What about compound [primary] keys?
This is a somewhat controversial topic. I personally am not a big fan of Ambler or his approach to database design. This is more or less the surrogate/natural key debate which is a perennial flamewar in the database world. I tend to think natural keys are generally superior but surrogates are useful in some situations. Automatic use of ID column in every table leads to lazy thinking and overcomplicated designs. However at times they can really optimize your database. So I will lay down some extremes and leave it up to you to find the middle ground. A trivial case of when not to use a serial key is like this: create table sex(id serial, sex char(1)); insert into sex(sex) select 'M' union all select 'F'; The reason is obvious, adding a join for no reason whenever you need to know the sex, albeit a simple one, and expanding the 1 character type to an int type. A somewhat more interesting case is: create table email(id serial, email text); In this case, while it may seem like a waste to store the full email in every table that references the email, you are optimizing the join out in such cases, which can be a big win and since there is no other properties of the email the email table only serves the purpose of maintaining relational integrity, iow no duplicates. In the event the email changes, we allow RI to cleanup the other tables...an integer proxy would be (at least to me) an meaningless abstraction of the email. A case of when not to use a natural key for relating is a bit more complex, some times you just get sick and tired of writing the key fields over and over, or you have measured and determined the natural key to be wasetful in terms of index performance. Another reason is if the p-key data changes frequently and the RI mechism is too expensive. In this case I would advise you to strictly key on the natural and make a candidate, serial key. create table foo ( nat_key1 text, nat_key2 int, foo_id serial, [...] primary key(nat_key1, nat_key2), unique(foo_id) -- candidate unique key ); This is my middle ground: there highly situational cases where an id column is a win on pracital reasons. however, a strong design around natural keys tends to make you think the problem through much more carefully and lead to a tighter database.
Also, any suggestions for good DB design books would be appreciated. I no longer have any good DB design books, and I don't know what's good anymore.
http://www.amazon.com/exec/obidos/ASIN/0201485559/databasede095-20?creative=327641&camp=14573&adid=07TEH0J3FS9SYN309QMS&link_code=as1 merlin