Re: [PATCH 2/9] spaceman/defrag: pick up segments from target file

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Wed, 24 Jul 2024 19:22:25 +0000

> On Jul 16, 2024, at 9:11 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Tue, Jul 16, 2024 at 08:23:35PM +0000, Wengang Wang wrote:
>>> Ok, so this is a linear iteration of all extents in the file that
>>> filters extents for the specific "segment" that is going to be
>>> processed. I still have no idea why fixed length segments are
>>> important, but "linear extent scan for filtering" seems somewhat
>>> expensive.
>> 
>> Hm… fixed length segments — actually not fixed length segments, but segment
>> size can’t exceed the limitation.  So segment.ds_length <=  LIMIT.
> 
> Which is effectively fixed length segments....
> 
>> Larger segment take longer time (with filed locked) to defrag. The
>> segment size limit is a way to balance the defrag and the parallel
>> IO latency.
> 
> Yes, I know why you've done it. These were the same arguments made a
> while back for a new way of cloning files on XFS. We solved those
> problems just with a small change to the locking, and didn't need
> new ioctls or lots of new code just to solve the "clone blocks
> concurrent IO" problem.
> 
> I'm looking at this from exactly the same POV. The code presented is
> doing lots of complex, unusable stuff to work around the fact that
> UNSHARE blocks concurrent IO. I don't see any difference between
> CLONE and UNSHARE from the IO perspective - if anything UNSHARE can
> have looser rules than CLONE, because a concurrent write will either
> do the COW of a shared block itself, or hit the exclusive block that
> has already been unshared.
> 
> So if we fix these locking issues in the kernel, then the whole need
> for working around the IO concurrency problems with UNSHARE goes
> away and the userspace code becomes much, much simpler.
> 
>>> Indeed, if you used FIEMAP, you can pass a minimum
>>> segment length to filter out all the small extents. Iterating that
>>> extent list means all the ranges you need to defrag are in the holes
>>> of the returned mapping information. This would be much faster
>>> than an entire linear mapping to find all the regions with small
>>> extents that need defrag. The second step could then be doing a
>>> fine grained mapping of each region that we now know either contains
>>> fragmented data or holes....

Where can we pass a minimum segment length to filter out things?
I don’t see that in the fiemap structure:

A fiemap request is encoded within struct fiemap:

struct fiemap {
__u64 fm_start; /* logical offset (inclusive) at
* which to start mapping (in) */
__u64 fm_length; /* logical length of mapping which
* userspace cares about (in) */
__u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */
__u32 fm_mapped_extents; /* number of extents that were
* mapped (out) */
__u32 fm_extent_count; /* size of fm_extents array (in) */
__u32 fm_reserved;
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
};

Thanks,
Wengang

>> 
>> Hm… just a question here:
>> As your way, say you set the filter length to 2048, all extents with 2048 or less blocks are to defragmented.
>> What if the extent layout is like this:
>> 
>> 1.    1
>> 2.    2049
>> 3.    2
>> 4.    2050
>> 5.    1
>> 6.    2051
>> 
>> In above case, do you do defrag or not?
> 
> The filtering presenting in the patch above will not defrag any of
> this with a 2048 block segment side, because the second extent in
> each segment extend beyond the configured max segment length. IOWs,
> it ends up with a single extent per "2048 block segment", and that
> won't get defragged with the current algorithm.
> 
> As it is, this really isn't a common fragmentation pattern for a
> file that does not contain shared extents, so I wouldn't expect to
> ever need to decide if this needs to be defragged or not.
> 
> However, it is exactly the layout I would expect to see for cloned
> and modified filesystem image files.  That is, the common layout for
> such a "cloned from golden image" Vm images is this:
> 
> 1.    1 written
> 2.    2049 shared
> 3.    2 written
> 4.    2050 shared
> 5.    1 written
> 6.    2051 shared
> 
> i.e. there are large chunks of contiguous shared extents between the
> small individual COW block modifications that have been made to
> customise the image for the deployed VM.
> 
> Either way, if the segment/filter length is 2048 blocks, then this
> isn't a pattern that should be defragmented. If the segment/filter
> length is 4096 or larger, then yes, this pattern should definitely
> be defragmented.
> 
>> As I understand the situation, performance of defrag it’s self is
>> not a critical concern here.
> 
> Sure, but implementing a low performing, high CPU consumption,
> entirely single threaded defragmentation model that requires
> specific tuning in every different environment it is run in doesn't
> seem like the best idea to me.
> 
> I'm trying to work out if there is a faster, simpler way of
> achieving the same goal....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx