Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

Kirill Tkhai <tkhai@xxxxx> · Wed, 7 Jun 2023 00:06:03 +0300

On 06.06.2023 01:32, Dave Chinner wrote:
> On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote:
>> This patch set introduces a new scheme of shrinker unregistration. It allows to split
>> the unregistration in two parts: fast and slow. This allows to hide slow part from
>> a user, so user-visible unregistration becomes fast.
>>
>> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
>> by kernel test robot:
>>
>> https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@xxxxxxxxx/
>>
>> ---
>>
>> Kirill Tkhai (2):
>>       mm: Split unregister_shrinker() in fast and slow part
>>       fs: Use delayed shrinker unregistration
> 
> Did you test any filesystem other than ramfs?
> 
> Filesystems more complex than ramfs have internal shrinkers, and so
> they will still be running the slow synchronize_srcu() - potentially
> multiple times! - in every unmount. Both XFS and ext4 have 3
> internal shrinker instances per mount, so they will still call
> synchronize_srcu() at least 3 times per unmount after this change.
> 
> What about any other subsystem that runs a shrinker - do they have
> context depedent shrinker instances that get frequently created and
> destroyed? They'll need the same treatment.

Of course, all of shrinkers should be fixed. This patch set just aims to describe
the idea more wider, because I'm not sure most people read replys to kernel robot reports.

This is my suggestion of way to go. Probably, Qi is right person to ask whether
we're going to extend this and to maintain f95bdb700bc6 in tree.

There is not much time. Unfortunately, kernel test robot reported this significantly late.

> Seriously, part of changing shrinker infrastructure is doing an
> audit of all the shrinker instances to determine how the change will
> impact those shrinkers, and if the same structural changes are
> needed to those implementations.
> 
> I don't see any of this being done - this looks like a "slap a bandaid
> over the visible symptom" patch set without any deeper investigation
> of the scope of the issue having been gained.
> 
> Along with all shrinkers now running under a SRCU critical region
> and requiring a machine wide synchronisation point for every
> unregister_shrinker() call made, the ability to repeated abort
> global shrinker passes via external SRCU expediting, and now an
> intricate locking and state dance in do_shrink_slab() vs
> unregister_shrinker, I can't say I'm particularly liking any of
> this, regardles of the benefits it supposedly provides.
> 
> -Dave.