Re: Normal vs Surrogate Primary Keys...

"rlee0001" <robeddielee@xxxxxxxxxxx> · 1 Oct 2006 19:48:14 -0700

Tom Lane wrote:
> "rlee0001" <robeddielee@xxxxxxxxxxx> writes:
> > ... I know, for example, that by default PostgreSQL assigns every record a
> > small unique identifier called an OID.
>
> Well, actually, that hasn't been the default for some time, and even if
> you turn it on it's not guaranteed unique without additional steps, and
> seeing that it's only 32bits wide it's not usable as a unique ID for
> very large tables.
>
> "unique" and "small" are more or less contradictory in this context.
>
> > ... it seems like this is the sort of thing that even a fully SQL-compliant
> > DBMS could do internally to compensate for the performance issues with
> > using large natural keys in relationships.
>
> What performance issues are those, exactly?  I have seen no data that
> proves that large key size is a bottleneck for reasonable schema designs
> in Postgres.  Maybe that just means that we have more fundamental
> problems to fix :-( ... but there's no point in investing a lot of
> effort on an aspect that isn't the next bottleneck.

Well from what I understand, there are basically two reasons to use
surrogate primary keys: 1) No reliable natural candidate key exists or
2) The natural candidate keys are percieved to be far too large/complex
to use as the primary key. I have yet to hear anyone recommend the use
of surrogate keys in modern databases for any other reason. Obviously
that some entities in practice have no reliable natural keys cannot be
helped and in those cases a surrogate key pretty much has to be used,
enless the data modeler would find it acceptable to use every attribute
in the entity as part of the primary key (in order to ensure uniqueness
of each record overall).

The second argument for the use of surrogate keys is simply that
reliable natural candidate keys are often perceived to be too large to
use as primary keys. The perception seems to be that large primary keys
consume a considerable amount of storage space when used in foreign
keys. For example, if I key "employee" by Last Name, First Name, Date
of Hire and Department, I would need to store copies of all this data
in any entity that relates to an employee (e.g. payroll, benefits and
so on). In addition, if any of these fields change in value, that
update would need to cascade to any related entities, which might be
perceived as a performance issue if there are many related records.

I'm not saying that PostgreSQL specifically has performance problems
but that using large natural keys can hamper performance (both in terms
of storage space required and cascading update time) when a lot of
relationships exist between entities with such large keys.

Personally I hate using surrogate keys except in places where no
reliable natural key exists but find it nessisary in order to improve
the efficiency of foreign keys in the database. So my proposal was
simply to have the DBMS internally create and use an invisible
surrogate key to identify and relate records, but expose the natural
key to the environment. The currently OID implimentation cannot be used
for this as you've already stated but I think a similar implimentation
could work.

I'm just wondering if anyone else would take advantage of the
performance benefit, or perhaps sees a flaw in my logic.

> 			regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
>        message can get through to the mailing list cleanly