Re: [regression] 5.15 kernel triggering 100x more inode evictions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 08, 2022 at 04:52:22PM +0200, David Sterba wrote:
> On Fri, Apr 08, 2022 at 12:32:20PM +0200, Thorsten Leemhuis wrote:
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> > 
> > Btrfs maintainers, what's up here? Yes, this regression report was a bit
> > confusing in the beginning, but Bruno worked on it. And apparently it's
> > already fixed in 5.16, but still in 5.15. Is this caused by a change
> > that is to big to backport or something?
> 
> I haven't identified possible fixes in 5.16 so I can't tell how much
> backport efforts it could be. As the report is related to performance on
> package updates, my best guess is that the patches fixing it are those
> from Filipe related to fsync/logging, and there are several of such
> improvements in 5.16. Or something else that fixes it indirectly.

So there's a lot of confusion in the thread, and the original openSUSE 
bugzilla [1] is also a bit confusing and large to follow.

Let me try to make it clear:

1) For some reason, outside btrfs' control, inode eviction is triggered
   a lot on 5.15 kernels in Bruno's test machine when doing package
   installations/updates with zypper. It triggers about 100x times more
   compared to 5.13, 5.14, 5.16 kernels, etc. This was measured with the
   bpftrace script I provided him at [1], and he's including part of it
   in his test script from this thread too;

2) If an inode is evicted, reloaded and then we attempt to do a rename on
   it, it can trigger unnecessary log updates, for the inode and/or the
   parent directory. This is just btrfs not knowing if the inode was
   previously logged in the current transaction before the inode was
   evicted - since it doesn't know for sure, it assumes the worst case,
   that is was logged, and then updates the log (partially relog the inode
   and its parent directory), otherwise we could get into an inconsistency
   in case it was logged before and we don't update the log;

3) About the excessive inode eviction, there's nothing we can do in btrfs,
   it's outside btrfs' control;

4) What can be done, and was done in a recent patchset [2] (5.18-rc1), was
   to make the behaviour on rename to not be so pessimistic, and instead
   accurately determine if an inode was logged before or not, even if it was
   recently evicted, and then skip log updates.

   The test scripts in the change logs of the patches of that patchset,
   essentially mimic what was happening with the zypper package
   installations/updates. Bruno's test script basically copies/integrates
   those test scripts;

5) We can not just backport that patchset [2] into 5.15, because that depends
   on several other patchsets that landed in 5.16, 5.17 and 5.18-rc1, which
   mostly do a heavy rework regarding directory logging:

   https://lore.kernel.org/linux-btrfs/cover.1630419897.git.fdmanana@xxxxxxxx/ (5.16)
   https://lore.kernel.org/linux-btrfs/cover.1631787796.git.fdmanana@xxxxxxxx/ (5.16)
   https://lore.kernel.org/linux-btrfs/cover.1632482680.git.fdmanana@xxxxxxxx/ (5.16)
   https://lore.kernel.org/linux-btrfs/cover.1635178668.git.fdmanana@xxxxxxxx/ (5.17)
   https://lore.kernel.org/linux-btrfs/cover.1639568905.git.fdmanana@xxxxxxxx/ (5.18-rc1)

   And possibly other smaller dependencies in between those patchsets;

6) In short, it is not known what causes the excessive evictions on 5.15
   on his machine for that specific workload - we don't have a commit to
   point at and say it caused a regression. The previously mentioned
   patchset ([2]) will however make things much better, performance wise, in
   case excessive inode eviction happens (regarding renames on btrfs).

This thread is also basically a revamp of an older thread [3].

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549
[2] https://lore.kernel.org/linux-btrfs/cover.1642676248.git.fdmanana@xxxxxxxx/
[3] https://lore.kernel.org/linux-fsdevel/MN2PR20MB251235DDB741CD46A9DD5FAAD24E9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux