On Tue, Jun 06, 2023 at 11:24:32AM +1000, Dave Chinner wrote: > On Mon, Jun 05, 2023 at 05:38:27PM -0700, Roman Gushchin wrote: > > On Mon, Jun 05, 2023 at 10:03:25PM +0300, Kirill Tkhai wrote: > > > Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec > > > test case caused by commit: f95bdb700bc6 ("mm: vmscan: make global slab > > > shrink lockless"). Qi Zheng investigated that the reason is in long SRCU's > > > synchronize_srcu() occuring in unregister_shrinker(). > > > > > > This patch fixes the problem by using new unregistration interfaces, > > > which split unregister_shrinker() in two parts. First part actually only > > > notifies shrinker subsystem about the fact of unregistration and it prevents > > > future shrinker methods calls. The second part completes the unregistration > > > and it insures, that struct shrinker is not used during shrinker chain > > > iteration anymore, so shrinker memory may be freed. Since the long second > > > part is called from delayed work asynchronously, it hides synchronize_srcu() > > > delay from a user. > > > > > > Signed-off-by: Kirill Tkhai <tkhai@xxxxx> > > > --- > > > fs/super.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/super.c b/fs/super.c > > > index 8d8d68799b34..f3e4f205ec79 100644 > > > --- a/fs/super.c > > > +++ b/fs/super.c > > > @@ -159,6 +159,7 @@ static void destroy_super_work(struct work_struct *work) > > > destroy_work); > > > int i; > > > > > > + unregister_shrinker_delayed_finalize(&s->s_shrink); > > > for (i = 0; i < SB_FREEZE_LEVELS; i++) > > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > > kfree(s); > > > @@ -327,7 +328,7 @@ void deactivate_locked_super(struct super_block *s) > > > { > > > struct file_system_type *fs = s->s_type; > > > if (atomic_dec_and_test(&s->s_active)) { > > > - unregister_shrinker(&s->s_shrink); > > > + unregister_shrinker_delayed_initiate(&s->s_shrink); > > > > Hm, it makes the API more complex and easier to mess with. Like what will happen > > if the second part is never called? Or it's called without the first part being > > called first? > > Bad things. Agree. > Also, it doesn't fix the three other unregister_shrinker() calls in > the XFS unmount path, nor the three in the ext4/mbcache/jbd2 unmount > path. > > Those are just some of the unregister_shrinker() calls that have > dynamic contexts that would also need this same fix; I haven't > audited the 3 dozen other unregister_shrinker() calls around the > kernel to determine if any of them need similar treatment, too. > > IOWs, this patchset is purely a band-aid to fix the reported > regression, not an actual fix for the underlying problems caused by > moving the shrinker infrastructure to SRCU protection. This is why > I really want the SRCU changeover reverted. > > Not only are the significant changes the API being necessary, it's > put the entire shrinker paths under a SRCU critical section. AIUI, > this means while the shrinkers are running the RCU grace period > cannot expire and no RCU freed memory will actually get freed until > the srcu read lock is dropped by the shrinker. > > Given the superblock shrinkers are freeing dentry and inode objects > by RCU freeing, this is also a fairly significant change of > behaviour. i.e. cond_resched() in the shrinker processing loops no > longer allows RCU grace periods to expire and have memory freed with > the shrinkers are running. > > Are there problems this will cause? I don't know, but I'm pretty > sure they haven't even been considered until now.... > > > Isn't it possible to hide it from a user and call the second part from a work > > context automatically? > > Nope, because it has to be done before the struct shrinker is freed. > Those are embedded into other structures rather than being > dynamically allocated objects. This part we might consider to revisit, if it helps to solve other problems. Having an extra memory allocation (or two) per mount-point doesn't look that expensive. Again, iff it helps with more important problems. Thanks!