Search Postgresql Archives

Re: a newbie question on table design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you can select stable structure (common columns) and additional columns
than you can :

1. Use base class (table) with common columns + inherited tables with their
   own additional columns
2. We use contrib/hstore as a storage for semistructured data - we store
   additional columns as a key->value pairs in hstore data type. This is
   very flexible design.

On Fri, 15 Feb 2008, Leif B. Kristensen wrote:

On Friday 15. February 2008, fdu.xiaojf@xxxxxxxxx wrote:
Hi all,

I have a large sdf file with many records of molecules and associated
data items and I want to save them in a PostgreSQL database. There are
about less than 40 data items for every molecule(the names of the data
items fore each molecule are the same, but values differ).  The number
of molecules may exceed 20 million.

Now I have come up with two ways to construct the table:

1) a table with about 40 columns, every data item has its
corresponding column, and one molecule corresponds to one row in the
table.

This is direct and simple. The drawbacks is if I want to add more data
types to the database, I have to modify the structure of the table.

2) a table with just 3 columns:

   CREATE TABLE mytable(
   id              serial,
   data_name       text,
   data_value      text
   );

Then a single molecule will corresonds to about 40 rows in the
database.

This is a sound concept, but I'd rather store the data_name in a
separate table with an integer key, and replace data_name in mytable
with a data_name_fk REFERENCES data_names (data_name_id). That's just
Occam's Razor applied to database design, aka first normal form.

You'd probably store the name of the molecule in a third table. Then you
have a model very similar to the classic 'book database' where a book
can have multiple authors, and an author can have multiple books. There
are examples for this design all over the place.

If I need to add more data types to the table, I just need to add new
rows with new "data_name" column values. The drawback of this table is
it has too many rows(40 times of the former one) and waste a lot
space.

Which one is better, or there are some other smart ways ?

I have another question. Since the data I need to save is huge, is it
appropriate that I save the data value in compressed format ?

That sounds a lot like premature optimization. Postgres is actually
quite good at compacting data natively.


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux