Re: Performance of CLUSTER

Mark Thornton <mthornton@xxxxxxxxxx> · Mon, 11 Jun 2012 15:02:23 +0100

On 11/06/12 14:52, Shaun Thomas wrote:
On 06/11/2012 08:46 AM, Mark Thornton wrote:

500m --- though isn't clear if cluster uses maintenance memory or the
regular work memory. I could readily use a higher value for
maintenance_work_mem.

For an operation like that, having a full GB wouldn't hurt. Though if 
you haven't already, you might think about pointing 
I didn't think the process was using even the 500m it ought to have had 
available, whereas creating an index did appear to use that much. Note 
though that I didn't stay up all night watching it!

your pgsql_tmp to /dev/shm for a while, even for just this operation.

Then again, is your CPU at 100% during the entire operation?
No the CPU utilization is quite low. Most of the time is waiting for IO.

If it's not fetching anything from disk or writing out much, reducing 
IO won't help. :) One deficiency we've had with CLUSTER is that it's a 
synchronous operation. It does each thing one after the other. So 
it'll organize the table contents, then it'll reindex each index 
(including the primary key) one after the other. If you have a lot of 
those, that can take a while, especially if any composite or complex 
indexes exist.
In this case there are only two indexes, the gist one and a primary key 
(on a bigint value).

You might actually be better off running parallel REINDEX commands on 
the table (I wrote a script for this because we have a 250M row table 
that each index takes 1-2.5 hours to build). You might also consider 
pg_reorg, which seems to handle some parts of a table rebuild a little 
better.

That should give you an escalation pattern, though. :)

Thanks for your help,
Mark Thornton

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance