Re: Avoiding surrogate keys

Thom Brown <thombrown@xxxxxxxxx> · Wed, 21 Apr 2010 20:30:44 +0100

On 21 April 2010 20:18, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:

On Wed, Apr 21, 2010 at 3:01 PM, Thom Brown <thombrown@xxxxxxxxx> wrote:

> I think I know what I plan to do, but want to throw this out there to see if

> there are differing points of view.

> I have a mailing list table, and 2 of the columns contain values which have

> to be from a list.  These are country and status.  There are 237 possible

> countries and 3 possible statuses.  Now I know some people would assign a

> sequence ID (surrogate key) to the country and status values, and have them

> looked up in separate tables to get the textual value, but I think I'll

> still have those tables, just without an ID column, so 1 column for both the

> countries and statuses tables.  This means storing the proper value in the

> main table.

> So instead of

> name, email, country, status

> 'mr smith', 'emailaddress@xxxxxxxxxxx', 44, 2

> 'mrs jones', 'me@xxxxxxxxxxxxxxxx', 21, 1

> I'd have

> name, email, country, status

> 'mr smith', 'emailaddress@xxxxxxxxxxx', 'China', 'Registered'

> 'mrs jones', 'me@xxxxxxxxxxxxxxxx', 'Belgium', 'Unconfirmed'

> The values of course would be constrained by foreign key lookup to their

> associated tables.

> Are there any serious downsides to this?  If so, what would you recommend?

Natural keys:

*) force formal relationships into your key design (this is good)

*) Make your database MUCH easier to follow, browse, and understand

*) in particular cases allow you to skip joins

*) will make your indexes fatter (this is not good)

*) can be a pain if your keys are updated frequently

*) can be a major pain if your key changes in structure (adds a field,

or changes in type)

Surrogate keys:

*) Give you faster joins, but more of them (this is a win/loss

depending on circumstances)

*) Tend to encourage lazy/poor designs, since you hide relationships

behind a value

*) Make the tables more difficult to browse and understand

*) Make updates to keys/key structure trivial

I personally use natural keys when I can and surrogates when I have

to.  When I do use a surrogate, I tend to still define the natural key

as primary and simply make a alternate 'unique' constraint for the

surrogate.

merlin

Thanks for the comments guys.  I'm now pretty sure using natural keys is the right approach.  Neither list will ever change type, and if they change, it'll be quite infrequent.  Index size shouldn't really be a problem since we're realistically talking thousands of rows.

What I hate about surrogate keys is having to keep looking stuff up:

INSERT INTO stuff (col_a, col_b)
SELECT 'my_value', table_b.id

FROM table_b
WHERE table_b.real_value = 'PostgreSQL';

As opposed to:

INSERT INTO stuff (col_a, col_b) VALUES ('my_value', 'PostgreSQL');

Just making sure I don't fall into the surrogate key abuse trap. :)

Thom