On 4/18/07, Dave Dutcher <dave@xxxxxxxxxxxx> wrote:
I think a database with all natural keys is unrealistic. For example if you have a table that refers to people, are you going to use their name as a primary key? Names change all the time due to things like marriage, divorce, or trouble with the law. We have tables with 20 million rows which reference back to a table of people, and if I used the person's name as key, it would be a major pain when somebody's name changes. Even if there is referential integrity, one person might be referred to by 25% of the 20 million rows, so the update would take quite a long time. Also the table will be filled with dead rows and the indexes will likely be bloated. If I want to clean that up, it will take a vacuum full or a cluster which will lock the whole table and run for hours. If I use a surrogate key, I can change their name in one row and be done with it.
That's perfectly reasonable (I mentioned this upthread)...there are a couple of corner cases where RI costs too much Exchanging a surrogate for a natural is a valid performance consideration. Usually, the performance win is marginal at best (and your example suggests possible normalization issues in the child table), sometimes there is no alternative....updating 5 million rows is obviously nasty. That said -- if the cost of update was zero, would you still do it that way? I'm trying to separate performance related issues, which are reasonable and valid depending on the situation, with good design principles. merlin