On Wed, 2006-12-06 at 09:19 -0500, Elliot Lee wrote: > On Dec 5, 2006, at 2:32 PM, Toshio Kuratomi wrote: > >> status text not null default 'D' > >> check ( status IN ('D','A','M','E','R') ) > >> > > What's the justification? Here's my reasoning: > > [Single letter code] > > + Harder to misspell > > + Takes less storage space > > [Full words] > > You're sticking the status into a column of type 'TEXT', though. TEXT > is not even like varchar which defaults to 1024 characters or so of > storage. I believe TEXT tells the database 'be ready to store between > zero and 4 billion bytes in this column'. > varchar and text have had equivalent performance and storage in postgreSQL for a long time. According to the 8.2 docs[1]_, char is currently no better (and it space pads your entries so there's probably no reason to use it if you aren't planning on switching databases) (This change appears to have been part of 7.1 [2]_) This assumes you store the same values in each column. The value of varchar is it allows you to specify a limit which can prevent the user from using 'supercalifragilisticexpialadocious' as a value. But in most of our use cases, allowing room for expansion seems fine. In this particular case, we're using a constraint to prevent random values so we wouldn't be providing any additional limitations on the user's ability to enter long data. [1]_ http://www.postgresql.org/docs/8.2/interactive/datatype-character.html [2]_ http://www.postgresql.org/docs/7.2/static/release-7-1.html > If you want to cut down on storage space, check the field types and > specify maximum sizes. In this case, varchar(1) would be a more > appropriate type. Since the database will wind up storing this field > using at least 4 bytes of space for alignment purposes, might as well > take it to varchar(3) or so to also allow for future expansion (and > maybe a bit more readability). > I'm not too concerned with storage space bloat from the status field. Karel's clarification about db-to-client transfer makes sense. However, I'm in the camp that believes the status strings belong in the database so I think the typical usage will transfer the same amount of data. (If the statusTranslations table is cached this might be better, though... more room for optimization if we have separate tables.) > To balance fast indexing/comparison, small storage space, and room > for expansion, it's often easier to use INTEGER for these type of > columns, and then assign meaning to those values elsewhere (either > through a separate table that translates values to strings, or by > #define-like constants in the source code). > I think we'll do this with INTEGER and a separate table because Jeffrey Ollie's proposal to allow translations to the status codes makes sense. I really hate having constants in source code because it hinders the database's ability to enforce integrity. The source code of the application has to stay in sync with the database. If there's more than one consumer, each one has to stay in sync. Enforcing this is a problem that the database was designed to solve. > </stuff you never wanted to know about choosing types for DB columns> :) I thought about this as I modified the schema, really! ;-) -Toshio
Attachment:
signature.asc
Description: This is a digitally signed message part