> -----Original Message----- > From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general- > owner@xxxxxxxxxxxxxx] On Behalf Of Rob Sargent > Sent: Monday, May 02, 2011 7:07 PM > To: Jeff Davis > Cc: Greg Smith; pgsql-general@xxxxxxxxxxxxxx > Subject: Re: pervasiveness of surrogate (also called > synthetic) keys > > > > Jeff Davis wrote: > > On Mon, 2011-05-02 at 11:10 -0400, Greg Smith wrote: > >> The position Merlin > >> has advocated here, that there should always be a natural key > available > >> if you know the data well enough, may be true. But few people are > good > >> enough designers to be sure they've made the decision correctly, and > the > >> downsides of being wrong can be a long, painful conversion process. > >> Easier for most people to just eliminate the possibility of making a > >> mistake by using auto-generated surrogate keys, where the primary > >> problem you'll run into is merely using more space/resources than > you > >> might otherwise need to have. It minimizes the worst-case--mistake > make > >> in the model, expensive re-design--by adding overhead that makes the > >> average case more expensive. > > > > Once you really try to define "natural" and "surrogate" keys, I think > a > > lot of the arguments disappear. I wrote about this a few years back: > > > > http://thoughts.j-davis.com/2007/12/11/terminology-confusion/ > > > > In particular, I think you are falsely assuming that a natural key > must > > be generated from an outside source (or some source outside of your > > control), and is therefore not reliably unique. > > > > You can generate your own keys, and if you hand them out to customers > > and include them on paperwork, they are now a part of the reality > that > > your database models -- and therefore become natural keys. Invoice > > numbers, driver's license numbers, etc., are all natural keys, > because > > they are known about, and used, in reality. Usernames are, too, the > only > > difference is that you let the user choose it. > > > > In contrast, a pointer or a UUID typically does not represent > reality, > > because no humans ever see it and no computer systems outside yours > know > > about it. So, it's merely an implementation detail and should not be > a > > part of the model. > > > > Regards, > > Jeff Davis > > > My wife works (at the sql level) with shall we say "records about > people". Real records, real people. Somewhere around 2 million unique > individuals, several million source records. They don't all have ssn, > they don't all have a drivers license. They don't all have an address, > many have several addresses (especially over time) and separate people > have at one time or another lived at the same address. You would be > surprise how many "bob smith"s where born on the same day. But then > they weren't all born in a hospital etc etc etc. A person may present > on any of a birth record, a death record, a hospital record, a drivers > license, a medical registry, a marriage record and so on. There simply > is no natural key for a human. We won't even worry about the > non-uniqueness of ssn. And please don't get her started on twins. :) > > > I can only imagine that other equally complex entities are just as > slippery when it comes time to pinpoint the natural key. People are sometimes surprised to discover the social security numbers are not unique. There are fraudulent social security numbers: http://www2.nbc4i.com/news/2010/dec/06/2/study-finds-millions-duplicate-social-security-num-ar-316988/ There are numbers given out by the IRS by mistake: http://wnyt.com/article/stories/S1594530.shtml?cat=10115 -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general