Re: [PATCH 6/9] spaceman/defrag: workaround kernel xfs_reflink_try_clear_inode_flag()

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Thu, 18 Jul 2024 18:24:39 +0000

> On Jul 15, 2024, at 5:25 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Tue, Jul 09, 2024 at 12:10:25PM -0700, Wengang Wang wrote:
>> xfs_reflink_try_clear_inode_flag() takes very long in case file has huge number
>> of extents and none of the extents are shared.
> 
> Got a kernel profile showing how bad it is?

It was more than 1.5 seconds (basing on 6.4 millions of extents) when I add debug code to measure it.

> 
>> 
>> workaround:
>> share the first real extent so that xfs_reflink_try_clear_inode_flag() returns
>> quickly to save cpu times and speed up defrag significantly.
> 
> That's nasty.
> 
> Let's fix the kernel code, not work around it in userspace.
> 
> I mean, it would be really easy to store if an extent is shared in
> the iext btree record for the extent. If we do an unshare operation,
> just do a single "find shared extents" pass on the extent tree and
> mark all the extents that are shared as shared.  Then set a flag on
> the data fork saying it is tracking shared extents, and so when we
> share/unshare extents in that inode from then on, we set/clear that
> flag in the iext record. (i.e. it's an in-memory equivalent of the
> UNWRITTEN state flag).
> 
> Then after the first unshare, checking for nothing being shared is a
> walk of the iext btree over the given range, not a refcountbt
> walk. That should be much faster.
> 
> And we could make it even faster by adding a "shared extents"
> counter to the inode fork. i.e. the first scan that sets the flags
> also counts the shared extents, and we maintain that as we maintain
> the iin memory extent flags....
> 
> That makes the cost of xfs_reflink_try_clear_inode_flag() basically
> go to zero in these sorts of workloads. IMO, this is a much better
> solution to the problem than hacking around it in userspace...
> 

Yes, fixing it in kernel is the best way to go.
Well, one consideration is that the customers don’t run on upstream kernel.
They might run a much lower version. And some customers don’t want kernel
upgrades if there are no security issues.
So can we have both? 
1. Trying to fix kernel and
2. Keep the workaround in defrag usersapce?

Thanks,
Wengang
>