Re: Disadvantage to CLUSTER?

Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> · Tue, 15 May 2012 14:27:46 -0700

On 05/15/2012 02:02 PM, Robert James wrote:
Besides the one time spent CLUSTERing, do I loose anything by doing it
for every table?  Does a CLUSTER slow anything down?

It would seem to me that a) a CLUSTER should never have worse
performance than a random order b) may have better performance and c)
has the benefits of a VACUUM and REINDEX.  So, I may as well cluster
every table, if only by the primary key.

Cluster requires an exclusive lock - you need a sufficiently long 
maintenance window. (But so does vacuum full which is typically slower 
and vacuum-full doesn't make shiny new indexes.)

Cluster requires sufficient disk space to create the new copies of the 
table.

Cluster only reorganizes data at the time you do the cluster - 
additional use after the cluster will cause data to lose its order.

Since you are clustering based on an index you can only choose one 
cluster order.

Cluster should have better performance but it depends on the index you 
choose relative to the queries you typically run. Let's say that you 
have an accounting system where you most often grab the most recent 
month worth of data. Clustering that keeps that data together will be 
beneficial but you could easily have a different index, item-number for 
instance, that would, if used for clustering, leave the commonly used 
data scattered throughout the table. If that table was an append-only 
detail table the most commonly used data would naturally clump together 
which clustering would then destroy.

With the exception of disk-space constraints there are few reasons to 
use vacuum-full/reindex instead of cluster.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general