On Thursday 07 January 2010 09:57:59 Kevin Grittner wrote: > Ludwik Dylag <ldylag@xxxxxxxxx> wrote: > > I would suggest: > > 1. turn off autovacuum > > 1a. ewentually tune db for better performace for this kind of > > operation (cant not help here) > > 2. restart database > > 3. drop all indexes > > 4. update > > 5. vacuum full table > > 6. create indexes > > 7. turn on autovacuum > > I've only ever attempted something like that with a few tens of > millions of rows. I gave up on waiting for the VACUUM FULL step > after a few days. > > I some scheduled down time is acceptable (with "some" kind of hard > to estimate accurately) the best bet would be to add the column with > the USING clause to fill in the value. (I think that would cause a > table rewrite; if not, then add something to the ALTER TABLE which > would.) My impression is that the OP would rather stretch out the > implementation than to suffer down time, which can certainly be a > valid call. > > If that is the goal, then the real question is whether there's a way > to tune the incremental updates to speed that phase. Carlo, what > version of PostgreSQL is this? Can you show us the results of an > EXPLAIN ANALYZE for the run of one iteration of the UPDATE? > Information on the OS, hardware, PostgreSQL build configuration, and > the contents of postgresql.conf (excluding all comments) could help > us spot possible techniques to speed this up. > > -Kevin > If you can come up with an effective method of tracking updates/deletes/inserts such as a trigger that writes the PK to a separate table upon any inserts, updates or deletes to the table you could do something like this: 1) create new table (no indexes) with the structure you want the table to have at the end of the process (i.e. the post-altered state) [new_tab] 2) create the insert,update,delete triggers mentioned above on the existing table [curr_tab] and write all the PK id's that change into a 3rd table [changed_keys] 3) kick off a process that simply does a select from curr_tab into new_tab and populates/back-fills the new column as part of the query 4) let it run as long as it takes 5) once it's complete do this: Create the all the indexes on the new_tab BEGIN; LOCK TABLE curr_tab; DELETE from new_tab where pk_id in (select distinct pk_id from changed_keys); INSERT into new_tab select * from curr_tab where curr_tab.pk_id in (select distinct pk_id from changed_keys); ALTER TABLE curr_tab RENAME to old_tab; ALTER TABLE new_tab RENAME to curr_tab; COMMIT; Also you might want to consider partitioning this table in the process... Once you're confident you no longer need the old table [old_tab] you can drop it -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance