Tuning massive UPDATES and GROUP BY's?

fork <forkandwait@xxxxxxxxx> · Thu, 10 Mar 2011 15:40:53 +0000 (UTC)

Given that doing a massive UPDATE SET foo = bar || ' ' || baz; on a 12 million
row table (with about 100 columns -- the US Census PUMS for the 2005-2009 ACS)
is never going to be that fast, what should one do to make it faster?

I set work_mem to 2048MB, but it currently is only using a little bit of memory
and CPU. (3% and 15% according to top; on a SELECT DISTINCT ... LIMIT earlier,
it was using 70% of the memory).

The data is not particularly sensitive; if something happened and it rolled
back, that wouldnt be the end of the world.  So I don't know if I can use
"dangerous" setting for WAL checkpoints etc.   There are also aren't a lot of
concurrent hits on the DB, though a few.

I am loathe to create a new table from a select, since the indexes themselves
take a really long time to build.

As the title alludes, I will also be doing GROUP BY's on the data, and would
love to speed these up, mostly just for my own impatience...  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance