Search Postgresql Archives

Re: Using varchar primary keys.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/4/13 at 10:35 AM, Tim Uckun wrote:

Consider the following scenario.

Since I don't consider myself an expert I am talking in basic terms below. Please don't misunderstand this plodding as anything other than me attempting to be as clear as possible.

These comments reflect my understanding of Joe Celko's view as expressed in his books. I fully understand that many respectable SQL masters (respectfully) disagree with Celko in this. My take is to use his approach as a starting point and only deviate when a different approach is obviously much easier for me. (I doubt it's better. :). YMMV.

I have a typical tagging structure. There is a table called tags, there is
a table called taggings. The taggings table acts as a many to many join
table between the taggers and the tags.

The tags table has two fields id and tag. id is a serial data type.

Celko would suggest the table name is the plural of its primary key, so a table named tags has tag as the name of its primary key. And this is a big hint: the sequential index is not needed. And he further (very strongly) suggests that primary keys assume their natural values and not be referenced by arbitrary (sequential) values, i.e., avoid pseudo-keys, as neither SQL nor modern data bases need them.

The taggings has a tag_id field along with the context, tagger_id etc.

So this would instead have two columns named "tag" and "tagger" (each referencing the column of the same name in tables "tags" and "taggers") and whatever is implied by the "etc". if the pair (tag,tagger) is meant to be unique then this is a good candidate for primary key, otherwise a further term(s) needs to be dragged into the composite index (e.g., a timestamp). Once again there is no logical need for a sequential integer to be stored as well.


....

Given that you can still set foreign keys and cascade commands to adjust
child records either way the tables would be properly normalized so I am
wondering what I am gaining by using these serial ID fields.

Is this false economy?

I think so.

If the tags table has both sequential integer and the unique tag value then there is likely storage and I/O associated with two indexes, along with storage of both tag and associated integer. If the list of tags is short enough it might be suitable to use an ENUM to enforce both integrity and brevity.

If the list of tags is longer and/or might need updating then a single column table will allow for this. In general, I think lists of things (e.g., tag values) should be kept within the dB so they can be used for integrity checking, etc. I don't like the idea of externalising this job to the interface application. Also, as you have mentioned, you can elegantly handle table renovation, eg., if a given tag needs its name changed, it can be done via a foreign key constraint with cascade. (Elegant here refers to the syntax, the actual I/O implications may be very ugly depending on the scale of data that needs rewriting. :)

In another situation the tags table might well contain columns such as: tag, tag_full_name, tag_origin, tag_authority_certificate/expiry/whatever. In such an instance the tag is better if it's human readable/meaningful so reports can be useful without always back joining the full name. Sequential integers rarely fulfil this role as implied by the original question.

Regards
Gavan Schneider



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux