On 02/22/2010 07:01 PM, Dave Crooke wrote: > The original data is keyed by a 3-ary tuple of strings .... to keep the > row size down, in the new data model I'm actually storing 32-bit int's > in Postgres. The new schema for each table looks like this: > > (a integer, > b integer, > c integer, > ts timestamp without timezone, > value double precision) > > with two indexes: (a, b, ts) and (b, ts) [...snip...] > There are about 60 different values of b, and for each such value there > is a exactly one type of rollup. The old code is doing the rollups in > Postgres with 60 bulk "insert into .... select" statements, hence the > need for the second index. [...snip...] > For better scalability, I am partitioning these tables by time .... I am > not using PG's table inheritance and triggers to do the partitioning, > but instead dynamically generating the SQL and table names in the > application code (Java). In most cases, the rollups will still happen > from a single source "data_table" and I plan to continue using the > existing SQL, but I have a few cases where the source "data_table" rows > may actually come from two adjacent tables. Without going through your very long set of questions in detail, it strikes me that you might be better off if you: 1) use PostgreSQL partitioning (constraint exclusion) 2) partition by ts range 3) consider also including b in your partitioning scheme 4) create one index as (ts, a) 5) use dynamically generated SQL and table names in the application code to create (conditionally) and load the tables But of course test both this and your proposed method and compare ;-) Also you might consider PL/R for some of your analysis (e.g. mode would be simple, but perhaps not as fast): http://www.joeconway.com/web/guest/pl/r HTH, Joe
Attachment:
signature.asc
Description: OpenPGP digital signature