tables with lots of columns - what alternative from performance point of view?

hubert depesz lubaczewski <depesz@xxxxxxxxx> · Wed, 7 Dec 2005 08:42:38 +0100

hi

jus recently there were some thread on postgresql list with people asying : i have 700 columns, i have 1000 columns and so on.

some people, imediatelly responded: change your schema.

this is what forced to me ask:

i have a situation where i ahve to store a number of "objects" in database.

all objects have 3 specific attributes (which go into objects table), and may have a lot of "custom fields".

basically - lsit of accessible custom fields for object depends on which object-category this object belongs to.

now.

i know, i could have written it in this way:

create table object_custom_fields (id serial primary key, object_id int8, field_id int8, field_value text);

but:

this approach has two very big drawbacks (for me):

1. the table cannot differentiate between custom fields of type "date", "number" and so on. - everything is stored as text.

2. it is rather slow. i have to do a non-unique index scan over
object_custom_fields, get all records, and pivot it (on the client side
of curse) to make it usable.

i did it differently, definitelly not nicely, but i dont see any other
way to get this performance with unknown list of custom fields:

1. create table cf_types (id serial, codename text, representation text);

2. create table cf_definitions (id serial, category_id int8, type_id int8, field-number int4);

3. create table cf_values (id serial, object_id int8 (unique), ...................................................);

where

cf_types store information like this:

 id |  codename  | representation 

----+------------+----------------

  1 | bool       | boolean

  2 | integer    | integer

  3 | number     | number

  4 | text       | text

  5 | note       | text

  6 | date       | date

...

basically - there might be many "types" with the same representation.

then

cf_values have a lot of (128 at the moment) fields for all possible representations.

basically it looks like:

id, object_id, boolean_1 ... boolean_128, integer_1..integer_128, ...

the datatypes of this fields relate to their content (integer_* fields have datatype int8, and so on).

now.

in cf_definitions i specify, category, field_type_id, and a field-number - which relates to _NUMBER in fields in cf_values.

what i did achive is *very* fast retrieval of data for any given object.

the schema of cf_values table is absolutelly awful, and i will never say differently.

my point is - if somebody (tom lane for example) says - redesign your
schema - whenever he reads about table with 700 column (i have more :)
- then i must have missed something absolutelyl simple, fast and
elegant. what is this?

depesz