Thanks very much for the thoughtful responses - very enlightening.
- Jim (the topic originator)
Jim Irrer irrer@xxxxxxxxx (734) 647-4409
University of Michigan Hospital Radiation Oncology
519 W. William St. Ann Arbor, MI 48103
On Mon, May 2, 2011 at 11:10 AM, Greg Smith <greg@xxxxxxxxxxxxxxx> wrote:
On 05/01/2011 06:12 PM, Karsten Hilbert wrote:This debate exists at all levels of experience, and the only thing that changes as you get more experienced people involved is an increase in anecdotes on each side. The sole time I ever found myself arguing with Joe Celko is over an article he wrote recommending natural keys, using an example from the automotive industry. Problem was, the specific example he gave was flat out wrong. I was working in automotive MIS at the time, and the thing he was saying would never change did, in fact, change every year--in only a fraction of a percent of cases, in an extremely subtle way that snuck up on people and wreaked much confusion. That's typical for an early natural key design: you get it working fine in V1.0, only to discover months or years down the road there's a case you never considered you don't model correctly, and it may take some sort of conversion to fix.
Good to know since I'm only a lowly medical doctor not
having much schooling in database matters beyond this list,
the PostgreSQL docs, and the Celko book.
The reason why there's a strong preference for surrogate keys is that they always work and you can avoid ever needing to come up with a better design. if you just use them and forget about it. The position Merlin has advocated here, that there should always be a natural key available if you know the data well enough, may be true. But few people are good enough designers to be sure they've made the decision correctly, and the downsides of being wrong can be a long, painful conversion process. Easier for most people to just eliminate the possibility of making a mistake by using auto-generated surrogate keys, where the primary problem you'll run into is merely using more space/resources than you might otherwise need to have. It minimizes the worst-case--mistake make in the model, expensive re-design--by adding overhead that makes the average case more expensive. Software design usually has enough risks that any time you can eliminate one just by throwing some resources at it, that's normally the right thing to do.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general