Search Postgresql Archives

Re: pervasiveness of surrogate (also called synthetic) keys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> owner@xxxxxxxxxxxxxx] On Behalf Of Rob Sargent
> Sent: Monday, May 02, 2011 7:07 PM
> To: Jeff Davis
> Cc: Greg Smith; pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  pervasiveness of surrogate (also called
> synthetic) keys
> 
> 
> 
> Jeff Davis wrote:
> > On Mon, 2011-05-02 at 11:10 -0400, Greg Smith wrote:
> >> The position Merlin
> >> has advocated here, that there should always be a natural key
> available
> >> if you know the data well enough, may be true.  But few people are
> good
> >> enough designers to be sure they've made the decision correctly, and
> the
> >> downsides of being wrong can be a long, painful conversion process.
> >> Easier for most people to just eliminate the possibility of making a
> >> mistake by using auto-generated surrogate keys, where the primary
> >> problem you'll run into is merely using more space/resources than
> you
> >> might otherwise need to have.  It minimizes the worst-case--mistake
> make
> >> in the model, expensive re-design--by adding overhead that makes the
> >> average case more expensive.
> >
> > Once you really try to define "natural" and "surrogate" keys, I think
> a
> > lot of the arguments disappear. I wrote about this a few years back:
> >
> > http://thoughts.j-davis.com/2007/12/11/terminology-confusion/
> >
> > In particular, I think you are falsely assuming that a natural key
> must
> > be generated from an outside source (or some source outside of your
> > control), and is therefore not reliably unique.
> >
> > You can generate your own keys, and if you hand them out to customers
> > and include them on paperwork, they are now a part of the reality
> that
> > your database models -- and therefore become natural keys. Invoice
> > numbers, driver's license numbers, etc., are all natural keys,
> because
> > they are known about, and used, in reality. Usernames are, too, the
> only
> > difference is that you let the user choose it.
> >
> > In contrast, a pointer or a UUID typically does not represent
> reality,
> > because no humans ever see it and no computer systems outside yours
> know
> > about it. So, it's merely an implementation detail and should not be
> a
> > part of the model.
> >
> > Regards,
> > 	Jeff Davis
> >
> My wife works (at the sql level) with shall we say "records about
> people".  Real records, real people.  Somewhere around 2 million unique
> individuals, several million source records.  They don't all have ssn,
> they don't all have a drivers license.  They don't all have an address,
> many have several addresses (especially over time) and separate people
> have at one time or another lived at the same address.  You would be
> surprise how many "bob smith"s where born on the same day.  But then
> they weren't all born in a hospital etc etc etc.  A person may present
> on any of a birth record, a death record, a hospital record, a drivers
> license, a medical registry, a marriage record and so on.  There simply
> is no natural key for a human.  We won't even worry about the
> non-uniqueness of ssn. And please don't get her started on twins. :)
> 
> 
> I can only imagine that other equally complex entities are just as
> slippery when it comes time to pinpoint the natural key.

People are sometimes surprised to discover the social security numbers are not unique.

There are fraudulent social security numbers:
http://www2.nbc4i.com/news/2010/dec/06/2/study-finds-millions-duplicate-social-security-num-ar-316988/

There are numbers given out by the IRS by mistake:
http://wnyt.com/article/stories/S1594530.shtml?cat=10115



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux