Re: [PATCH 9/9] spaceman/defrag: warn on extsize

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 31 Jul 2024 08:43:55 +1000

On Mon, Jul 22, 2024 at 06:01:08PM +0000, Wengang Wang wrote:
> 
> 
> > On Jul 15, 2024, at 5:29 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > 
> > On Thu, Jul 11, 2024 at 11:36:28PM +0000, Wengang Wang wrote:
> >> 
> >> 
> >>> On Jul 9, 2024, at 1:21 PM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >>> 
> >>> On Tue, Jul 09, 2024 at 12:10:28PM -0700, Wengang Wang wrote:
> >>>> According to current kernel implemenation, non-zero extsize might affect
> >>>> the result of defragmentation.
> >>>> Just print a warning on that if non-zero extsize is set on file.
> >>> 
> >>> I'm not sure what's the point of warning vaguely about extent size
> >>> hints?  I'd have thought that would help reduce the number of extents;
> >>> is that not the case?
> >> 
> >> Not exactly.
> >> 
> >> Same 1G file with about 54K extents,
> >> 
> >> The one with 16K extsize, after defrag, it’s extents drops to 13K.
> >> And the one with 0 extsize, after defrag, it’s extents dropped to 22.
> > 
> > extsize should not affect file contiguity like this at all. Are you
> > measuring fragmentation correctly? i.e. a contiguous region from an
> > larger extsize allocation that results in a bmap/fiemap output of
> > three extents in a unwritten/written/unwritten is not fragmentation.
> 
> I was using FS_IOC_FSGETXATTR to get the number of extents (fsx.fsx_nextents).
> So if kernel doesn’t lie, I got it correctly. There was no unwritten extents in the files to defrag.

The kernel is not lying, and you've misunderstood what the kernel is
reporting as an extent. The kernel reports the count of -individual
extent records- it maintains, not the count of contiguous regions it
is mapping. Have a look at the implementation of fsx.fsx_nextents in
xfs_fill_fsxattr():

	if (ifp && !xfs_need_iread_extents(ifp))
                fa->fsx_nextents = xfs_iext_count(ifp);
        else
                fa->fsx_nextents = xfs_ifork_nextents(ifp);

We have:

inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
{
        return ifp->if_bytes / sizeof(struct xfs_iext_rec);
}

Which is the number of in-memory extents for the inode fork. Not
only does that include unwritten extent records, it includes delayed
allocation extents that don't even exist on disk.

And if we haven't read the extent list in from disk, we use:

static inline xfs_extnum_t xfs_ifork_nextents(struct xfs_ifork *ifp)
{
        if (!ifp)
                return 0;
        return ifp->if_nextents;
}

Which is a count of the on-disk extents for the inode fork which
counts both written and unwritten extent records.

IOWs, both of these functions count unwritten extents as separate
extents to written extents, even if they are contiguous.  That means
a single contiguous extent with an unwritten region in the middle of
it:

	0	1	2	3
	+WWWWWWW+UUUUUUU+WWWWWWW+

Is reported as three extent records - {0,1,W}, {1,1,U}, {2,1,W} -
and so fsx.fsx_nextents will report 3 extents despite the fact that
file is *not* fragmented at all.

Hence interpretting fsx.fsx_nextents as a number that accurately
reflects actual extent fragmentation levels is incorrect. If you
have a sparse file or mixed written/unwritten regions, the extent
count will be much higher than expected but it does not indicate
that the file is fragmented at all.

Applications need to look at the actual extent map that is returned
from FIEMAP to determine if there is significant fragmentation that
can be addressed, not just the raw extent count.

> (As I mentioned somewhere else), though extsize is mainly used to
> align the number of blocks, it breaks delayed-allocations.
> In the unshare path, there are N allocations performed for the N
> extents respectively in the segment to be defragmented. 

That's largely irrelevant to the issue at hand.  If there is
sufficient free space in the filesystem, the allocator will first
attempt and succeed at contiguous allocation. Hence the size of each
allocation is irrelevant as they will be laid out contiguously given
sufficient large contiguous free space.

Indeed, this is how allocation for direct IO works, and it doesn't
have problems with fragmentation of files for single threaded
sequential IO for the same reasons....

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx