Re: a newbie question on table design

"Leif B. Kristensen" <leif@xxxxxxxxxxxxxx> · Fri, 15 Feb 2008 15:43:57 +0100

On Friday 15. February 2008, fdu.xiaojf@xxxxxxxxx wrote:
>Hi all,
>
>I have a large sdf file with many records of molecules and associated
>data items and I want to save them in a PostgreSQL database. There are
>about less than 40 data items for every molecule(the names of the data
>items fore each molecule are the same, but values differ).  The number
>of molecules may exceed 20 million.
>
>Now I have come up with two ways to construct the table:
>
>1) a table with about 40 columns, every data item has its
> corresponding column, and one molecule corresponds to one row in the
> table.
>
>This is direct and simple. The drawbacks is if I want to add more data
>types to the database, I have to modify the structure of the table.
>
>2) a table with just 3 columns:
>
>    CREATE TABLE mytable(
>    id              serial,
>    data_name       text,
>    data_value      text
>    );
>
>Then a single molecule will corresonds to about 40 rows in the
> database.

This is a sound concept, but I'd rather store the data_name in a 
separate table with an integer key, and replace data_name in mytable 
with a data_name_fk REFERENCES data_names (data_name_id). That's just 
Occam's Razor applied to database design, aka first normal form.

You'd probably store the name of the molecule in a third table. Then you 
have a model very similar to the classic 'book database' where a book 
can have multiple authors, and an author can have multiple books. There 
are examples for this design all over the place.

>If I need to add more data types to the table, I just need to add new
>rows with new "data_name" column values. The drawback of this table is
>it has too many rows(40 times of the former one) and waste a lot
> space.
>
>Which one is better, or there are some other smart ways ?
>
>I have another question. Since the data I need to save is huge, is it
>appropriate that I save the data value in compressed format ?

That sounds a lot like premature optimization. Postgres is actually 
quite good at compacting data natively.
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings