Re: Package DB Schema v3

Toshio Kuratomi <a.badger@xxxxxxxxx> · Wed, 06 Dec 2006 11:19:29 -0800

On Wed, 2006-12-06 at 09:19 -0500, Elliot Lee wrote:
> On Dec 5, 2006, at 2:32 PM, Toshio Kuratomi wrote:
> >>  status     text not null default 'D'
> >>             check ( status IN ('D','A','M','E','R') )
> >>
> > What's the justification?  Here's my reasoning:
> > [Single letter code]
> > + Harder to misspell
> > + Takes less storage space
> > [Full words]
> 
> You're sticking the status into a column of type 'TEXT', though. TEXT  
> is not even like varchar which defaults to 1024 characters or so of  
> storage. I believe TEXT tells the database 'be ready to store between  
> zero and 4 billion bytes in this column'.
> 

varchar and text have had equivalent performance and storage in
postgreSQL for a long time.  According to the 8.2 docs[1]_, char is
currently no better (and it space pads your entries so there's probably
no reason to use it if you aren't planning on switching databases) (This
change appears to have been part of 7.1 [2]_)

This assumes you store the same values in each column.  The value of
varchar is it allows you to specify a limit which can prevent the user
from using 'supercalifragilisticexpialadocious' as a value.  But in most
of our use cases, allowing room for expansion seems fine.  In this
particular case, we're using a constraint to prevent random values so we
wouldn't be providing any additional limitations on the user's ability
to enter long data.

[1]_
http://www.postgresql.org/docs/8.2/interactive/datatype-character.html
[2]_ http://www.postgresql.org/docs/7.2/static/release-7-1.html

> If you want to cut down on storage space, check the field types and  
> specify maximum sizes. In this case, varchar(1) would be a more  
> appropriate type. Since the database will wind up storing this field  
> using at least 4 bytes of space for alignment purposes, might as well  
> take it to varchar(3) or so to also allow for future expansion (and  
> maybe a bit more readability).
> 
I'm not too concerned with storage space bloat from the status field.
Karel's clarification about db-to-client transfer makes sense.  However,
I'm in the camp that believes the status strings belong in the database
so I think the typical usage will transfer the same amount of data.  (If
the statusTranslations table is cached this might be better, though...
more room for optimization if we have separate tables.)

> To balance fast indexing/comparison, small storage space, and room  
> for expansion, it's often easier to use INTEGER for these type of  
> columns, and then assign meaning to those values elsewhere (either  
> through a separate table that translates values to strings, or by  
> #define-like constants in the source code).
> 
I think we'll do this with INTEGER and a separate table because Jeffrey
Ollie's proposal to allow translations to the status codes makes sense.
I really hate having constants in source code because it hinders the
database's ability to enforce integrity.  The source code of the
application has to stay in sync with the database.  If there's more than
one consumer, each one has to stay in sync.  Enforcing this is a problem
that the database was designed to solve.

> </stuff you never wanted to know about choosing types for DB columns> :)

I thought about this as I modified the schema, really! ;-)

-Toshio
Attachment:
signature.asc

Description: This is a digitally signed message part