Re: [PATCH 2/9] spaceman/defrag: pick up segments from target file

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 19 Jul 2024 14:59:21 +1000

On Thu, Jul 18, 2024 at 07:03:40PM +0000, Wengang Wang wrote:
> 
> 
> > On Jul 16, 2024, at 9:11 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > 
> > On Tue, Jul 16, 2024 at 08:23:35PM +0000, Wengang Wang wrote:
> >>> Ok, so this is a linear iteration of all extents in the file that
> >>> filters extents for the specific "segment" that is going to be
> >>> processed. I still have no idea why fixed length segments are
> >>> important, but "linear extent scan for filtering" seems somewhat
> >>> expensive.
> >> 
> >> Hm… fixed length segments — actually not fixed length segments, but segment
> >> size can’t exceed the limitation.  So segment.ds_length <=  LIMIT.
> > 
> > Which is effectively fixed length segments....
> > 
> >> Larger segment take longer time (with filed locked) to defrag. The
> >> segment size limit is a way to balance the defrag and the parallel
> >> IO latency.
> > 
> > Yes, I know why you've done it. These were the same arguments made a
> > while back for a new way of cloning files on XFS. We solved those
> > problems just with a small change to the locking, and didn't need
> > new ioctls or lots of new code just to solve the "clone blocks
> > concurrent IO" problem.
> 
> I didn’t check the code history, but I am thinking you solved the problem
> by allow reads to go while cloning is in progress? Correct me if I'm wrong.
> The problem we hit is (heart beat) write timeout.  

The reason this worked (allowing shared reads through and not
writes) was that the VM infrastructure this was being done for uses
a sidecar write channel to redirect writes while a clone is being
done. i.e. writes are not blocked by the clone in progress because
they are being done to a different file.

When the clone completes, those writes are folded back into the
original image file. e.g. see the `qemu-img commit -b <backing file>
<file with delta writes>` which will fold writes to a sidecar write
file back into the original backing file that was just cloned....

What I'm suggesting is that when you run an backing file
defragmentation, you use the same sidecar write setup as cloning
whilst the defrag is done. Reads go straight through to the backing
file, and writes get written to a delta write file. When the defrag
is done the delta write file gets folded back into the backing file.

But for this to work, UNSHARE needs to use shared read locking so
that read IO can be directed through the file at the same time as
the UNSHARE is running. If this works for CLONE to avoid read and
write blocking whilst the operation is in progress, the same
mechanism should be able to be used for UNSHARE, too. At this point
defrag using CLONE+UNSHARE shouldn't ever block read IO and
shouldn't block write IO for any significant period of time,
either...

-Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx