Rainer Bauer wrote: > Alvaro Herrera wrote: > >It has been theorized that cluster would be faster in general if instead > >of doing an indexscan we would instead use a seqscan + sort step. It > >would be good to measure it. > > Could a reindex on the clustered index speed up the clustering (when executed > immediatelly before the cluster command)? As I understand it, this index is > used to fetch the table data in the correct order. Or is most of the time > spend fetching the table data? I haven't measured it, but my guess is that most of the time is in fetching heap pages in random order. > Also, would it make sense to increase <shared_buffers> for the cluster > operation. This is set to 32MB here on my Windows box as was recommended. Not sure. In general yes, but on Windows things are different. > >> >For btree indexes, there is a temporary copy of the index data, which > >> >will go wherever you have arranged for temp files to go. (I think that > >> >easy user control of this may be new for 8.3, though.) > >> > >> Could you give me a hint where that would be on Windows? I guess this might be > >> worth a try since there are a couple of btree indexes in the database. > > > >I think Tom is referring to the new temp_tablespaces config variable. > > I moved the pgsql_tmp directory to another disk, but that didn't speed up the > cluster command. Probably most of the time is going into creating the new table then. If you are looking for a short-term solution to your problem, maybe the best is to follow the recommendation on CLUSTER ref page: There is another way to cluster data. The CLUSTER command reorders the original table by scanning it using the index you specify. This can be slow on large tables because the rows are fetched from the table in index order, and if the table is disordered, the entries are on random pages, so there is one disk page retrieved for every row moved. (PostgreSQL has a cache, but the majority of a big table will not fit in the cache.) The other way to cluster a table is to use CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; which uses the PostgreSQL sorting code to produce the desired order; this is usually much faster than an index scan for disordered data. Then you drop the old table, use ALTER TABLE ... RENAME to rename newtable to the old name, and recreate the table's indexes. The big disadvantage of this approach is that it does not preserve OIDs, constraints, foreign key relationships, granted privileges, and other ancillary properties of the table — all such items must be manually recreated. Another disadvantage is that this way requires a sort temporary file about the same size as the table itself, so peak disk usage is about three times the table size instead of twice the table size. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy) ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend