Search Postgresql Archives

Re: clustering without locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Scott Ribe wrote:
Wouldn't new / updated tuples just get put in the hole, fairly rapidly
un-clustering the table again?

How is that different than putting them in newly-allocated space at the end
of the table?

It isn't, I just wasn't thinking straight.

This is probably a stupid idea as I'm fairly clueless about Pg's innards, but I find myself wondering about doing a non-exclusive cluster in small progressive chunks in series of short transactions.

In each transaction tuples from a smallish contiguous chunk of pages (beginning at the start of the table and moving onward as the cluster progresses) are copied to free or newly allocated space later in the table and given the xid of the transaction. The old versions are marked as having been obsoleted by this transaction. The transaction then commits. This step is much like a standalone UPDATE that's not actually changing the field values and that has some weird tuple selection criteria.

Progressive clustering then waits until the last transaction that could see the old copies of the tuples that were just relocated finishes. When the space that was occupied by the moved tuples becomes reclaimable (ie vacuum could mark it as free if it was to run) progressive clustering resumes. A new transaction is begun. It looks up tuples from the index being clustered on in order and reads enough to fill the space just freed into RAM. It then writes those tuples into the freed space (without ever actually marking it as free, so no other transactions could ever write to it), giving them the xid of the writing transaction. The old copies are marked as obsoleted by this transaction and the transaction commits. Again, it's like an UPDATE, but combined with a small selective vacuum that never bothers to update the free space map because it instantly reuses the space.

Now a small chunk of the table, at the start, is in order. The process can now begin again with the next chunk of unordered pages. It never needs to create a huge hole in the table or expand the table massively. It can even space the data out if a fillfactor has been set by leaving gaps as it writes the replacement data into the freed chunks and updating the free space map.

The progressive cluster would also need to be able to run a periodic vacuum (or do the equivalent internally) with a restriction that prevented it from examining pages in the range the progressive cluster was currently trying to free. Otherwise all the space freed by old versions of rows that're now neatly ordered in the table wouldn't be freed and couldn't be used as scratch space.

In my ignorance of Pg's innards it seems like all this should work, though it'd basically never finish if there were long running transactions touching the table being clustered. The other potential problem I wonder about is bloating the indexes, since every record in the table is being moved twice over the course of the progressive cluster operation.

Oh: There's actually only a need for one transaction per step:

begin transaction
move ordered tuples into just-freed hole
move tuples from next block of pages to free space later in table
commit
wait for all transactions that can see the old versions to complete
repeat

So ... is this crazy? Concurrently clustering the table by moving each record *twice*, in batches, with pauses to allow old versions to cease being visible by any live transaction? Or can it actually work?

--
Craig Ringer


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux