On 4/14/07, Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
In response to "Kynn Jones" <kynnjo@xxxxxxxxx>: > The two situations are semantically identical: each record in table bar > refers to a record in table foo. The difference is that in the first > schema, this referencing is done through an "artificial" serial-integer > primary key, while in the second schema this reference is done through a > data field that happens to be unique and not null, so it can serve as > primary key. I had a discussion about this recently on the Drupal mailing lists, at the end of which I promised to do some benchmarking to determine whether or not text keys really do hurt performance of indexes. Unfortunately, I still haven't followed through on that promise -- maybe I'll get to it tomorrow.
The main reason why integer indexes are faster than natural counterparts is that the index is smaller and puts less pressure on cache. This is however offset by removing joins here and there and you usually just end up indexing the data anyways. Performance is kind of tangential to the argument though -- I've seen databases using all natural keys and found them to be very clean and performant. Using surrogate keys is dangerous and can lead to very bad design habits that are unfortunately so prevalent in the software industry they are virtually taught in schools. Many software frameworks assume you use them and refuse to work without them (avoid!) While there is nothing wrong with them in principle (you are exchanging one key for another as a performance optimization), they make it all too easy to create denormalized designs and tables with no real identifying criteria, etc, and the resultant stinky queries to put it all back together again, (full of unions, self joins, extraneous groups, case statements, etc). A good compromise in your designs is to identify your natural key but use the surrogate if you have valid performance reasons: CREATE TABLE foo ( frobnitz_id int unique, frobnitz character(varying 100) PRIMARY KEY\ [...] ); frobnitz_id is of course optional and not necessary in all tables. It may be a pain to relate a large table with a four or five part key and judicious use of surrogates may be justified for performance or even just to keep your queries smaller: create table order_line_item_discount ( company_name text, order_no int, line_item_seq_no int, discount_code text, primary key(company_name, order_no, line_item_seq_no, discount_code) ) becomes create table order_line_item_discount ( order_line_item_id int, discount_code text, primary key (order_line_item_id, discount_code) ) merlin