Re: How to modify ENUM datatypes?

Jeff Davis <pgsql@xxxxxxxxxxx> · Tue, 22 Apr 2008 15:11:36 -0700

On Tue, 2008-04-22 at 16:34 -0500, D. Dante Lorenso wrote:
> I see this might be a 
> problem with storage since you will need to store the TEXT value for 
> every row in the 'mystuff' table instead of just storing the reference 
> to the lookup table as an INTEGER.  Over millions of rows, perhaps this 
> would become a concern?

It does use additional storage to store the full text value, rather than
a fixed-size integer. However, the difference is not much when the
average string length is short.

If you store an integer reference instead, joins are not necessarily
expensive. If the number of distinct values is small (which is the
normal use case for ENUM), I would expect the joins to be quite cheap.
Beware of running into bad plans however, or making the optimizer work
too hard (if you have a lot of other joins, too).

I don't think the performance concerns are major, but worth considering
if you have millions of rows.

> What is the general consensus by the community about this approach?  Is 
> this de-normalization frowned upon, or is there a performance advantage 
> here that warrants the usage?

This is not de-normalization, at all. Normalization is a formal process,
and if this were de-normalization, you could find a specific rule that
is violated by this approach.

Look here:
http://en.wikipedia.org/wiki/Third_normal_form

If you go to higher normal forms, you will not find any violations
there, either. There is nothing about normalization that requires the
use of surrogate keys.

The approach suggested by Scott Marlowe is normalized as well as being
quite natural and simple. I think often this is overlooked as being "too
simple", but it's a quite good design in many cases.

Regards,
	Jeff Davis