Search Postgresql Archives

Re: clustering without locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> writes:
Later on, though, less new space would have to be allocated because more and more of the space allocated earlier to hold moved tuples would be being freed up in useful chunks that could be reused.

I don't see how that works.  If the minimum size of the table is X
pages, ISTM that the first pass has to push everything up to pages above
X.

What I was suggesting was to essentially cluster the table in small parts. Rather than two huge passes (move all tuples to free / newly allocated space at end of table ; move back into old locations in order) it'd be done in a series of smaller moves.

After moving a chunk out of the way and into free/new space at the end of the table data would be copied from later in the table into the freed space. That space could then be re-used to hold data from the next chunk that needs to be moved out of the way.

I'm just trying to understand if it can actually work. Sorry if my attempted explanations are unclear; I'm probably doing a terrible job, and it's probably actually a stupid idea anyway (if nothing else it might just be too slow). Nonetheless I'm curious. Maybe I can explain another way.

Visually:

`0' to `9' : tuples. Desired eventual cluster order is face value.
`.' : Dead/obsoleted tuple not yet marked reusable by VACUUM
` ' : free space

Initial state:

---------
584736120
---------

Begin a transaction and free the first chunk (2 tuples in this case, but obviously many more in a real case):

-----------
..473612058
-----------

Use that freed space to store the first ordered tuples:

-----------
014736.2.58
-----------

Commit, and when the last transaction to which the "." tuples above are still visible completes mark them as free with VACUUM or similar.

-----------
014736 2 58
-----------

Repeat, but now use the freed space to store the next set of tuples that must be moved rather than extending the table:

-----------
01..3642758
-----------
-----------
0123.64.758
-----------
-----------
0123 64 758
-----------

During the next pass someone inserts `9' after tuples have been moved to make a hole and others have been moved into the hole, but before the old locations of the moved tuples are marked as free:

-----------
0123 .46758
-----------
-----------
012345.67.8
-----------
------------
012345.67.89          <- INSERT 9
------------
------------
012345 67 89
------------

You'd never land up with this sort of convenient ordering half way through in a real case with realistic numbers of tuples, so it'd keep going, doing small chunks each time, until the whole table had been processed.

So, the table does grow, and its final state does contain dead space at the end, but not *too* much of it:

------------
0123456789
------------



If "low" values are inserted late in the progressive cluster they can just stay at the end of the table. They'll get moved if the user runs a progressive cluster operation again later. However, since you're ideally doing this with a non-100% fillfactor, they should land up in roughly the right place on initial insert rather than at the end of the table, avoiding the issue (and helping avoid having newly inserted tuples prevent table truncation by vacuum when the progressive clustering finishes).

Say a table containing the range 1-9 was being clustered with a 50% fillfactor and was about half way through the process:

-------------
 1 2 3 47986
-------------

and someone inserts `0' it should ideally land up in the right place anyway (as a bit of reading suggests that insert tries to respect cluster order):

-------------
01 2 3 47986
-------------


If you're re-clustering a table with a fillfactor set you may not need to extend it at all, because you can use already allocated free space to store tuples temporarily while moving them around.

You can't put any temporary copies in pages <= X because you might
need that space when it comes time to make the clustering happen.  So
the table is going to bloat to (at least) 2X pages.

Yes, if you perform the whole process in a single huge chunk. What I was hoping was that it was possible to do it in a series of smaller operations instead, avoiding the need for such a huge amount of bloat at the end of the table.

Say you're clustering in 10 steps. You need X*0.1 worth of scratch space created by extending the table if there's not already appropriate free space late in the table. Assuming you can reclaim all space you've used for temporary copies of tuples after each pass your ideal final table size is X*1.1+(space used by new inserts), of which X*0.1 is free space at the end of the table. That free space probably has scattered rows from concurrent inserts, but if you're using a non-100% fillfactor it might not.

It seems like it should work, *if* space used for temporary storage can be efficiently reclaimed for reuse (or new inserts) after each pass. Even if it can work, though, I don't know if it'd actually be useful in the face of the effect on indexes and because of how slow it might land up being.

--
Craig Ringer


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux