Re: [PATCH v2 3/3] fs: Use delayed shrinker unregistration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 06, 2023 at 11:24:32AM +1000, Dave Chinner wrote:
> On Mon, Jun 05, 2023 at 05:38:27PM -0700, Roman Gushchin wrote:
> > On Mon, Jun 05, 2023 at 10:03:25PM +0300, Kirill Tkhai wrote:
> > > Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
> > > test case caused by commit: f95bdb700bc6 ("mm: vmscan: make global slab
> > > shrink lockless"). Qi Zheng investigated that the reason is in long SRCU's
> > > synchronize_srcu() occuring in unregister_shrinker().
> > > 
> > > This patch fixes the problem by using new unregistration interfaces,
> > > which split unregister_shrinker() in two parts. First part actually only
> > > notifies shrinker subsystem about the fact of unregistration and it prevents
> > > future shrinker methods calls. The second part completes the unregistration
> > > and it insures, that struct shrinker is not used during shrinker chain
> > > iteration anymore, so shrinker memory may be freed. Since the long second
> > > part is called from delayed work asynchronously, it hides synchronize_srcu()
> > > delay from a user.
> > > 
> > > Signed-off-by: Kirill Tkhai <tkhai@xxxxx>
> > > ---
> > >  fs/super.c |    3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/super.c b/fs/super.c
> > > index 8d8d68799b34..f3e4f205ec79 100644
> > > --- a/fs/super.c
> > > +++ b/fs/super.c
> > > @@ -159,6 +159,7 @@ static void destroy_super_work(struct work_struct *work)
> > >  							destroy_work);
> > >  	int i;
> > >  
> > > +	unregister_shrinker_delayed_finalize(&s->s_shrink);
> > >  	for (i = 0; i < SB_FREEZE_LEVELS; i++)
> > >  		percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> > >  	kfree(s);
> > > @@ -327,7 +328,7 @@ void deactivate_locked_super(struct super_block *s)
> > >  {
> > >  	struct file_system_type *fs = s->s_type;
> > >  	if (atomic_dec_and_test(&s->s_active)) {
> > > -		unregister_shrinker(&s->s_shrink);
> > > +		unregister_shrinker_delayed_initiate(&s->s_shrink);
> > 
> > Hm, it makes the API more complex and easier to mess with. Like what will happen
> > if the second part is never called? Or it's called without the first part being
> > called first?
> 
> Bad things.

Agree.

> Also, it doesn't fix the three other unregister_shrinker() calls in
> the XFS unmount path, nor the three in the ext4/mbcache/jbd2 unmount
> path.
> 
> Those are just some of the unregister_shrinker() calls that have
> dynamic contexts that would also need this same fix; I haven't
> audited the 3 dozen other unregister_shrinker() calls around the
> kernel to determine if any of them need similar treatment, too.
> 
> IOWs, this patchset is purely a band-aid to fix the reported
> regression, not an actual fix for the underlying problems caused by
> moving the shrinker infrastructure to SRCU protection.  This is why
> I really want the SRCU changeover reverted.
> 
> Not only are the significant changes the API being necessary, it's
> put the entire shrinker paths under a SRCU critical section. AIUI,
> this means while the shrinkers are running the RCU grace period
> cannot expire and no RCU freed memory will actually get freed until
> the srcu read lock is dropped by the shrinker.
> 
> Given the superblock shrinkers are freeing dentry and inode objects
> by RCU freeing, this is also a fairly significant change of
> behaviour. i.e.  cond_resched() in the shrinker processing loops no
> longer allows RCU grace periods to expire and have memory freed with
> the shrinkers are running.
> 
> Are there problems this will cause? I don't know, but I'm pretty
> sure they haven't even been considered until now....
> 
> > Isn't it possible to hide it from a user and call the second part from a work
> > context automatically?
> 
> Nope, because it has to be done before the struct shrinker is freed.
> Those are embedded into other structures rather than being
> dynamically allocated objects.

This part we might consider to revisit, if it helps to solve other problems.
Having an extra memory allocation (or two) per mount-point doesn't look
that expensive. Again, iff it helps with more important problems.

Thanks!



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux