On Tue, 2008-04-22 at 16:34 -0500, D. Dante Lorenso wrote: > I see this might be a > problem with storage since you will need to store the TEXT value for > every row in the 'mystuff' table instead of just storing the reference > to the lookup table as an INTEGER. Over millions of rows, perhaps this > would become a concern? It does use additional storage to store the full text value, rather than a fixed-size integer. However, the difference is not much when the average string length is short. If you store an integer reference instead, joins are not necessarily expensive. If the number of distinct values is small (which is the normal use case for ENUM), I would expect the joins to be quite cheap. Beware of running into bad plans however, or making the optimizer work too hard (if you have a lot of other joins, too). I don't think the performance concerns are major, but worth considering if you have millions of rows. > What is the general consensus by the community about this approach? Is > this de-normalization frowned upon, or is there a performance advantage > here that warrants the usage? This is not de-normalization, at all. Normalization is a formal process, and if this were de-normalization, you could find a specific rule that is violated by this approach. Look here: http://en.wikipedia.org/wiki/Third_normal_form If you go to higher normal forms, you will not find any violations there, either. There is nothing about normalization that requires the use of surrogate keys. The approach suggested by Scott Marlowe is normalized as well as being quite natural and simple. I think often this is overlooked as being "too simple", but it's a quite good design in many cases. Regards, Jeff Davis