Re: [PATCH] xfs: dont remain new blocks in cowfork for unshare

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Fri, 31 May 2024 17:53:23 +0000

> On May 31, 2024, at 9:00 AM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> 
> On Mon, May 20, 2024 at 10:42:02PM +0000, Wengang Wang wrote:
>> Thanks Darrick for review, pls see inlines:
>> 
>>> On May 20, 2024, at 11:08 AM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>>> 
>>> On Fri, May 17, 2024 at 02:26:21PM -0700, Wengang Wang wrote:
>>>> Unsharing blocks is implemented by doing CoW to those blocks. That has a side
>>>> effect that some new allocatd blocks remain in inode Cow fork. As unsharing blocks
>>> 
>>>                      allocated
>>> 
>>>> has no hint that future writes would like come to the blocks that follow the
>>>> unshared ones, the extra blocks in Cow fork is meaningless.
>>>> 
>>>> This patch makes that no new blocks caused by unshare remain in Cow fork.
>>>> The change in xfs_get_extsz_hint() makes the new blocks have more change to be
>>>> contigurous in unshare path when there are multiple extents to unshare.
>>> 
>>> contiguous
>>> 
>> Sorry for typos.
>> 
>>> Aha, so you're trying to combat fragmentation by making unshare use
>>> delayed allocation so that we try to allocate one big extent all at once
>>> instead of doing this piece by piece.  Or maybe you also don't want
>>> unshare to preallocate cow extents beyond the range requested?
>>> 
>> 
>> Yes, The main purpose is for the later (avoid preallocating beyond).
> 
> But the user set an extent size hint, so presumably they want us to (try
> to) obey that even for unshare operations, right?

Yeah, user might set extsize for better IO performance. But they don’t really know
much details. Consider this case: 
writing to those over/beyond preallocated blocks would cause Cow. Cow includes
extra meta changes: releasing old blocks, inserting new extents to data fork and removing
staging extents from refcount tree.  That’s a lot, as I think, a Cow is slower than block over-write.
In above case, the Cow is caused by unshare, rather than by shared blocks. That might be
not what user expected by setting extsize.

> 
>> The patch also makes unshare use delayed allocation for bigger extent.
> 
> If there's a good reason for not trying, can we avoid the iflag by
> detecting IOMAP_UNSHARE in the @flags parameter to
> xfs_buffered_write_iomap_begin

Yes, that would be better.

> and thereby use delalloc if there isn't
> an extent size hint set?

Hm… currently it’s using dealloc if extsize is 0. So you meant skip extsize (act as extsize is 0)
when IOMAP_UNSHARE is there?

> 
> (IOWs I don't really like that an upper layer of the fs sets a flag for
> a lower layer to catch based on the context of whatever operation it's
> doing, and in the meantime another thread could observe that state and
> make different decisions.)

Then what I can think the way to go might be to add new parameters to functions.
That might involves a lot of functions in the call path. Is that really better?
Or you have other better ways?

> 
>>>> Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
>>>> ---
>>>> fs/xfs/xfs_inode.c   | 17 ++++++++++++++++
>>>> fs/xfs/xfs_inode.h   | 48 +++++++++++++++++++++++---------------------
>>>> fs/xfs/xfs_reflink.c |  7 +++++--
>>>> 3 files changed, 47 insertions(+), 25 deletions(-)
>>>> 
>>>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>>>> index d55b42b2480d..ade945c8d783 100644
>>>> --- a/fs/xfs/xfs_inode.c
>>>> +++ b/fs/xfs/xfs_inode.c
>>>> @@ -58,6 +58,15 @@ xfs_get_extsz_hint(
>>>> */
>>>> if (xfs_is_always_cow_inode(ip))
>>>> return 0;
>>>> +
>>>> + /*
>>>> +  * let xfs_buffered_write_iomap_begin() do delayed allocation
>>>> +  * in unshare path so that the new blocks have more chance to
>>>> +  * be contigurous
> 
> "contiguous"
> 
>>>> +  */
>>>> + if (xfs_iflags_test(ip, XFS_IUNSHARE))
>>>> + return 0;
>>> 
>>> What if the inode is a realtime file?  Will this work with the rt
>>> delalloc support coming online in 6.10?
>> 
>> This XFS_IUNSHARE is not set in xfs_reflink_unshare() for rt inodes. 
>> So rt inodes will keep current behavior.
> 
> <nod> Please rebase this patch against 6.10-rc1 now that it's out.
> 

OK, will do.

Thanks,
Wengang

> --D
> 
>>> 
>>>> +
>>>> if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize)
>>>> return ip->i_extsize;
>>>> if (XFS_IS_REALTIME_INODE(ip))
>>>> @@ -77,6 +86,14 @@ xfs_get_cowextsz_hint(
>>>> {
>>>> xfs_extlen_t a, b;
>>>> 
>>>> + /*
>>>> +  * in unshare path, allocate exactly the number of the blocks to be
>>>> +  * unshared so that no new blocks caused the unshare operation remain
>>>> +  * in Cow fork after the unshare is done
>>>> +  */
>>>> + if (xfs_iflags_test(ip, XFS_IUNSHARE))
>>>> + return 1;
>>> 
>>> Aha, so this is also about turning off speculative preallocations
>>> outside the range that's being unshared?
>> 
>> Yes.
>> 
>>> 
>>>> +
>>>> a = 0;
>>>> if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>>>> a = ip->i_cowextsize;
>>>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
>>>> index ab46ffb3ac19..6a8ad68dac1e 100644
>>>> --- a/fs/xfs/xfs_inode.h
>>>> +++ b/fs/xfs/xfs_inode.h
>>>> @@ -207,13 +207,13 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
>>>> * i_flags helper functions
>>>> */
>>>> static inline void
>>>> -__xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
>>>> +__xfs_iflags_set(xfs_inode_t *ip, unsigned long flags)
>>> 
>>> I think this is already queued for 6.10.
>> 
>> Good to know.
>> 
>> Thanks,
>> Wengang