general question on two-partition table

Janet Jacobsen <jsjacobsen@xxxxxxx> · Mon, 27 Jul 2009 16:52:58 -0700

Hi.  We have a table with 30 M records that is growing by
about 100 K records per day.

The experimentalists, whose data are in the table, have
decided that they will focus on the records for which the
value of one field, rbscore, is greater than a cut-off.
However, they want to continue to store all of the data
- even the records for which rbscore is less than the cutoff
- in the database.

For the current table, there are about 400 K (out of 30 M)
records that meet the 'good' criteria.

Each record in the table has about 40 fields, and the
experimentalists do in fact write queries that use many of
those fields, some more than others.  (They are building a
model and have not pinned down exactly which fields are
more important than others, so that's why they store and
query by so many fields.)

If they are going to spend 95% of their time querying the
records that meet the 'good' criteria, what are the good
strategies for ensuring good performance for those queries?
(1) Should I partition the table into two partitions based on
the value of rbscore?
(2) Should I create two separate tables?

Are (1) and (2) more or less equivalent in terms of
performance?

I think that partitioning the table is a more flexible option
(i.e., what if the cutoff value changes, no need to change
the name of the table being queried, etc.), but would (2)
give better performance given that 95% of their queries
are for rbscores greater than a threshold value?

Can you suggest other strategies?

Thank you,
Janet

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general