Orion Henry <lab@xxxxxxxxxxxxxx> writes: > What I would LIKE to do but am afraid I will hit a serious performance wall > (or am missing an obvious / better way to do it) > > 1) Merge all 133 client tables into a single new table, add a client_id column, > do the data partitioning on the indexes not the tables as seen here: > > CREATE INDEX actioninfo_order_number_XXX_idx ON actioninfo ( order_number ) > WHERE client_id = XXX; > CREATE INDEX actioninfo_trans_date_XXX_idx ON actioninfo ( transaction_date ) > WHERE client_id = XXX; The advantages to the partitioned scheme are a) you can drop a client quickly in a single operation b) the indexes are only half as wide since they don't include client_id and c) you can do a sequential scan of an entire client without using the index at all. Unless any of these are overwhelming I would say to go ahead and merge them. If you frequently scan all the records of a single client or frequently drop entire clients then the current scheme may be helpful. > (Aside question: if I were to find a way to use COPY and I were loading > data on a single client_id, would dropping just the indexes for that client_id > accelerate the load?) Dropping indexes would accelerate the load but unless you're loading a large number of records relative the current size I'm not sure it would be a win since you would then have to rebuild the index for the entire segment. > 2) Find some way to make the bulk loads faster or more efficent (help!) If your existing data isn't changing while you're doing the load (and if it is then your existing load process has a race condition btw) then you could do it in a couple big queries: COPY ${table}_new FROM '...'; CREATE TABLE ${table}_exists as SELECT * FROM ${table}_new WHERE EXISTS (select 1 from $table where ${table}_new.key = $table.key); CREATE TABLE ${table}_insert as SELECT * FROM ${table}_new WHERE NOT EXISTS (select 1 from $table where ${table}_new.key = $table.key); UPDATE $table set ... FROM ${table}_exists WHERE ${table}_exists.key = ${table}.key INSERT INTO $table (select * from ${table}_insert) actually you could skip the whole ${table_insert} step there and just do the insert I guess. There are also other approaches you could use like adding a new column to ${table}_new instead of creating new tables, etc. > 3) Wrap each load into a transaction ( tens of thousands of records per load ) Yes, Postgres is faster if you do more operations in a single transaction. Every COMMIT means waiting for an fsync. The only disadvantage to batching them into a large transaction is if it lasts a *long* time then it could create problems with your vacuum strategy. Any vacuum that runs while the transaction is still running won't be able to vacuum anything. You might consider running VACUUM FULL or CLUSTER on the table when you're done with the loading process. It will lock the table while it runs though. -- greg