Re: Varchar vs foreign key vs enumerator - table and index size

Łukasz Walkowski <lukasz.walkowski@xxxxxxxxxx> · Sat, 31 Aug 2013 19:06:01 +0200

Tom,

> If you're starting to be concerned about space, it's definitely time to
> get away from this choice.  Depending on what locale you're using,
> comparing varchar values can be quite an expensive operation, too.

I don't like wasting space and processing power even if more work is required to achieve this. We use pl_PL.UTF-8 as our locale.

> I think the main "pro" of this approach is that it doesn't use any
> nonstandard SQL features, so you preserve your options to move to some
> other database in the future.  The main "con" is that you'd be buying into
> fairly significant rewriting of your application code, since just about
> every query involving these columns would have to become a join.

Well, I don't really think I will move from Postgresql anytime soon. It's just the best database for me. Rewriting code is one of the things I'm doing right now but before I touch database, I want to be sure that the choices I made are good.

> FWIW, I'd be inclined to just use integer not smallint.  The space savings
> from smallint is frequently illusory because of alignment considerations
> --- for instance, an index on a single smallint column will *not* be any
> smaller than one on a single int column.  And smallint has some minor
> usage annoyances because it's a second-class citizen in the type promotion
> hierarchy --- you may find yourself needing explicit casts to smallint
> here and there.

Ok, thats important information. Thank you.

> 
> Space-wise this is going to be equivalent to the integer-foreign-key
> solution.  It's much nicer from a notational standpoint, though, because
> you don't need joins --- it's likely that you'd need few if any
> application code changes to go this route.  (But I'd advise doing some
> testing to verify that before you take it as a given.)
> 
> You're right though that enums are not a good option if you expect
> frequent changes in the pool of allowed values.  I guess the question
> is how often does that happen, in your application?  Adding a new value
> from time to time isn't much of a problem unless you want to get picky
> about how it sorts relative to existing values.  But you can't ever delete
> an individual enum value, and we don't support renaming them either.
> (Though if you're desperate, I believe a manual UPDATE on the pg_enum
> catalog would work for that.)
> 
> Another thing to think about is whether you have auxiliary data about each
> value that might usefully be stored as additional columns in the small
> tables.  The enum approach doesn't directly handle that, though I suppose
> you could still create small separate tables that use an enum column as
> primary key.
> 
> 			regards, tom lane

So, I'll go for enumerators for device type, eventtype and eventsource as those columns are quite stable. For browser and operating system I'll do external tables.

Thank you - any additional tips are welcome.

Reagards,
Lukasz Walkowski

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance