On May 28, 2008, at 11:21 AM, Scott Ribe wrote:
If I'm not totally off-base, here's one way to enable clustering on systemsthat run 24/7: 1 cluster current rows 1.1 note current last committed transaction 1.2 copy all visible rows to new table in cluster order 1.3 build indexes on new table 2 add changes 2.1 note current last committed transaction 2.2 apply to new table (& indexes) all changes committed since 1.1 3 put new table into service 3.1 take exclusive lock on table
BOOM! Deadlock.
3.2 apply to new table (& indexes) all changes committed since 2.1 3.3 switch in new table 3.4 release lock 3.5 clean up old table storageI don't know enough about pg internals to know how big a project this would be, but it seems to me that the WAL provides many of the pieces needed to support steps 1.1 and 2.2, for instance. (Even so, I know it's still nottrivial, just perhaps not huge.)- I guess there's still the possibility that 3.1 could stall in the presence of long-lived transactions--but this is certainly no worse than the currentsituation where it would stall before starting the cluster operation.- By "apply changes" I mean insert, update, delete rows--of course schema changes would be locked out during the cluster, even if it takes days ;-)
What you're describing is possible; it's done for CREATE INDEX CONCURRENT. But it's not very easy to add. I think what makes a lot more sense is to have a form of clustering that puts effort into placing tuples in the correct location. If you had that, you could effectively migrate stuff into proper cluster order in userland; or just let it take care of itself. Presumable the table would eventually end up clustered if rows are updated often enough.
-- Decibel!, aka Jim C. Nasby, Database Architect decibel@xxxxxxxxxxx Give your computer some brain candy! www.distributed.net Team #1828
<<attachment: smime.p7s>>