On Sun, Nov 18, 2007 at 04:19:01AM +0000, Ron Johnson wrote: > If the PK was synthetic and generated by the engine, then a (buggy) > app could insert duplicate tolls and the system wouldn't utter a > peep. But the customer sure would when he saw the duplicate entries. You'd just need to put a UNIQUE constraint on an appropriate set of columns and the database would complain in all the same places. One big benefit that I've been able to discern with having natural, over synthetic, keys is that you need to pull in every table up the hierarchy when you want to find out something lower down. An example would probably help; say we were recording samples from locations, with these samples being collected in groups on specific visits. I've been using synthetic keys more recently, so I'll start with them: CREATE TABLE locations ( id SERIAL NOT NULL PRIMARY KEY, locntype TEXT ); CREATE TABLE visits ( id SERIAL PRIMARY KEY, locnid INTEGER NOT NULL REFERENCES locations, visitnum INTEGER NOT NULL, UNIQUE (locnid,visitnum), visitdate DATE ); CREATE TABLE samples ( id SERIAL PRIMARY KEY, visitid INTEGER NOT NULL REFERENCES visits, samplenum INTEGER NOT NULL, UNIQUE (visitid,samplenum), barcode TEXT UNIQUE ); Say I wanted to summarise the number of samples per location type, I'd be forced to join onto the visits table; like this: SELECT l.locntype, COUNT(*) FROM locations l, visits v, samples s WHERE l.id = v.locnid AND v.id = s.visitid GROUP BY l.locntype; If we now organise the tables using natural keys (note that the location id is still pretty artificial as it has to map back to something in the real world): CREATE TABLE locations ( id TEXT NOT NULL PRIMARY KEY, locntype TEXT ); CREATE TABLE visits ( locnid TEXT NOT NULL REFERENCES locations, visitnum INTEGER NOT NULL, PRIMARY KEY (locnid,visitnum), visitdate DATE ); CREATE TABLE samples ( locnid TEXT NOT NULL, visitnum INTEGER NOT NULL, samplenum INTEGER NOT NULL, PRIMARY KEY (locnid,visitnum,samplenum), FOREIGN KEY (locnid,visitnum) REFERENCES visits, barcode TEXT UNIQUE ); This has a nice simplifying effect on the above query; I can directly refer to the location for each sample: SELECT l.locntype, COUNT(*) AS samples FROM locations l, samples s WHERE l.id = s.locnid GROUP BY l.locntype; Is that a reasonable example of the different ways of laying things out in the two styles? The main advantage of synthetic keys is that it introduces another layer of abstraction into the database, potentially allowing greater flexibility (of course this can also be a burden). Natural keys allow the above simplification of queries and ease the manual fix-up of the data should something go horribly wrong. It's reasonably easy to fix things if you have a sample referencing a non-existent visit with natural keys, but if you've got synthetic keys you're probably going to have to dump the sample as well. Of course, if you've got yourself into this state you've already failed, but some recourse is nice. In summary; I see various advantages to either approach, but I don't see either as being fundamentally "better". You can express the same constraints in either style (I'm sure someone will think of a good counter example though), it's the context in which it's used that will determine which is more suitable to the task at hand. Using one style exclusively is almost certainly bad, but having a preference for one or the other is probably good as it'll make the database as a whole more cohesive and subsequently ease maintenance. Comments? Sam ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend