Re: Using varchar primary keys.

Julian <tempura@xxxxxxxxxxxxxxxx> · Wed, 03 Apr 2013 13:49:25 +1100

On 03/04/13 06:37, Merlin Moncure wrote:
> On Tue, Apr 2, 2013 at 2:01 PM, Samantha Atkins <sjatkins@xxxxxx> wrote:
>> Natural keys are in user data space.  Thus they are not guaranteed invariant and therefore cannot serve as persistent identity.

Can't find Samantha's original post. I agree but done right they can be
(persistent and unique)

> That is true, but irrelevant in most real world cases.  Also, nothing
> is keeping you from using an extra marker if/when you need to provide
> an invariant lookup.
> 
>> Also natural keys have the considerable defect of being of different type and arity per logical entity type.   This means that very nice things like dynamic relationships and persistent collections of references to multiple types of things cannot be built.  It also increases ORM  and data cache complexity.
> 
> OO evangelism.
> 
>> These are considerable weaknesses.

The Google cache is filled with "foo vs bar"
i.e. natural vs surrogate and its amazing what you get, even surrogate
key under wikipedia comes with a "its sources remain unclear because it
lacks inline citations" disclaimer.
I consider it pretty much a non debate.

>> You cannot guess a uuid from say uuid.uuid4().  If you think you can I look forward to seeing your proof.
> 
> I never claimed that.  I said that postgresql random() can be guessed,
> which it can, since it's based on lrand48.
> 
> merlin
> 
> 
Trying to get back on topic with the original post.

I have the iso (?) country code table, I have no problem with the PK
being the char(2) country code.
These natural codes/keys are thought out, debated and eventually decided
by other people.
But I also don't have a problem with adding an integer (serial) column
and making that my PK, although that PK might be pretty useless to the
rest of the world.

So thats that, having to really think it out is probably a good sign
that you should stick to a surrogate unless you are really sure.
(again I don't advocate ON UPDATE CASCADE as a solution should you
change your mind)

As to the whole natural vs surrogate/synthetic key debate, as I
mentioned in an earlier post I use them both. The question is when is it
worthy of a PK.

1)cust_id=123 (surrogate: PK)
vs
2)cust_id=1231 (natural: checkbit such as barcode data etc)
vs
3)cust_id=<natural: uuencoded binary that spits out "123" after being
decoded by the app>

For me, 2) is more likely to be a PK than 3), but it is entirely
possible that neither would be a PK.

Global/Universal unique identifiers, useful with replication and
database partitioning (in my instance)

1)cust_id=<uuid>

vs

2)cust_id=<shard_id>-<something_extra>-<cust_id>

1) will work, but 128bits is alot of random data that could be useful to
the app.
2) cust_id is not as universally unique, but if that was ever a problem
I could also wrap that in a encoded binary with a small change to the
app and no change to the database now it resembles something truly random.

The difference is 2) is more useful and contains "routing" data.
These are all natural and exposed to the world. But the question still
remains are they worthy of being a PK within the database of origin?
So far the answer from me is "doesn't have to be" but everyone else
might think it is, they may even make it a PK.

Regards,
Jules.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general