> On May 31, 2024, at 9:00 AM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Mon, May 20, 2024 at 10:42:02PM +0000, Wengang Wang wrote: >> Thanks Darrick for review, pls see inlines: >> >>> On May 20, 2024, at 11:08 AM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote: >>> >>> On Fri, May 17, 2024 at 02:26:21PM -0700, Wengang Wang wrote: >>>> Unsharing blocks is implemented by doing CoW to those blocks. That has a side >>>> effect that some new allocatd blocks remain in inode Cow fork. As unsharing blocks >>> >>> allocated >>> >>>> has no hint that future writes would like come to the blocks that follow the >>>> unshared ones, the extra blocks in Cow fork is meaningless. >>>> >>>> This patch makes that no new blocks caused by unshare remain in Cow fork. >>>> The change in xfs_get_extsz_hint() makes the new blocks have more change to be >>>> contigurous in unshare path when there are multiple extents to unshare. >>> >>> contiguous >>> >> Sorry for typos. >> >>> Aha, so you're trying to combat fragmentation by making unshare use >>> delayed allocation so that we try to allocate one big extent all at once >>> instead of doing this piece by piece. Or maybe you also don't want >>> unshare to preallocate cow extents beyond the range requested? >>> >> >> Yes, The main purpose is for the later (avoid preallocating beyond). > > But the user set an extent size hint, so presumably they want us to (try > to) obey that even for unshare operations, right? Yeah, user might set extsize for better IO performance. But they don’t really know much details. Consider this case: writing to those over/beyond preallocated blocks would cause Cow. Cow includes extra meta changes: releasing old blocks, inserting new extents to data fork and removing staging extents from refcount tree. That’s a lot, as I think, a Cow is slower than block over-write. In above case, the Cow is caused by unshare, rather than by shared blocks. That might be not what user expected by setting extsize. > >> The patch also makes unshare use delayed allocation for bigger extent. > > If there's a good reason for not trying, can we avoid the iflag by > detecting IOMAP_UNSHARE in the @flags parameter to > xfs_buffered_write_iomap_begin Yes, that would be better. > and thereby use delalloc if there isn't > an extent size hint set? Hm… currently it’s using dealloc if extsize is 0. So you meant skip extsize (act as extsize is 0) when IOMAP_UNSHARE is there? > > (IOWs I don't really like that an upper layer of the fs sets a flag for > a lower layer to catch based on the context of whatever operation it's > doing, and in the meantime another thread could observe that state and > make different decisions.) Then what I can think the way to go might be to add new parameters to functions. That might involves a lot of functions in the call path. Is that really better? Or you have other better ways? > >>>> Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx> >>>> --- >>>> fs/xfs/xfs_inode.c | 17 ++++++++++++++++ >>>> fs/xfs/xfs_inode.h | 48 +++++++++++++++++++++++--------------------- >>>> fs/xfs/xfs_reflink.c | 7 +++++-- >>>> 3 files changed, 47 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >>>> index d55b42b2480d..ade945c8d783 100644 >>>> --- a/fs/xfs/xfs_inode.c >>>> +++ b/fs/xfs/xfs_inode.c >>>> @@ -58,6 +58,15 @@ xfs_get_extsz_hint( >>>> */ >>>> if (xfs_is_always_cow_inode(ip)) >>>> return 0; >>>> + >>>> + /* >>>> + * let xfs_buffered_write_iomap_begin() do delayed allocation >>>> + * in unshare path so that the new blocks have more chance to >>>> + * be contigurous > > "contiguous" > >>>> + */ >>>> + if (xfs_iflags_test(ip, XFS_IUNSHARE)) >>>> + return 0; >>> >>> What if the inode is a realtime file? Will this work with the rt >>> delalloc support coming online in 6.10? >> >> This XFS_IUNSHARE is not set in xfs_reflink_unshare() for rt inodes. >> So rt inodes will keep current behavior. > > <nod> Please rebase this patch against 6.10-rc1 now that it's out. > OK, will do. Thanks, Wengang > --D > >>> >>>> + >>>> if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) >>>> return ip->i_extsize; >>>> if (XFS_IS_REALTIME_INODE(ip)) >>>> @@ -77,6 +86,14 @@ xfs_get_cowextsz_hint( >>>> { >>>> xfs_extlen_t a, b; >>>> >>>> + /* >>>> + * in unshare path, allocate exactly the number of the blocks to be >>>> + * unshared so that no new blocks caused the unshare operation remain >>>> + * in Cow fork after the unshare is done >>>> + */ >>>> + if (xfs_iflags_test(ip, XFS_IUNSHARE)) >>>> + return 1; >>> >>> Aha, so this is also about turning off speculative preallocations >>> outside the range that's being unshared? >> >> Yes. >> >>> >>>> + >>>> a = 0; >>>> if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) >>>> a = ip->i_cowextsize; >>>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h >>>> index ab46ffb3ac19..6a8ad68dac1e 100644 >>>> --- a/fs/xfs/xfs_inode.h >>>> +++ b/fs/xfs/xfs_inode.h >>>> @@ -207,13 +207,13 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size) >>>> * i_flags helper functions >>>> */ >>>> static inline void >>>> -__xfs_iflags_set(xfs_inode_t *ip, unsigned short flags) >>>> +__xfs_iflags_set(xfs_inode_t *ip, unsigned long flags) >>> >>> I think this is already queued for 6.10. >> >> Good to know. >> >> Thanks, >> Wengang