Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

Khazhismel Kumykov <khazhy@xxxxxxxxxx> · Mon, 16 Apr 2018 11:28:42 -0700

On Sun, Apr 15, 2018 at 3:34 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Apr 15, 2018 at 10:54:55PM +0100, Al Viro wrote:
>> On Sun, Apr 15, 2018 at 09:40:54PM +0100, Al Viro wrote:
>>
>> > BTW, the current placement of cond_resched() looks bogus; suppose we
>> > have collected a lot of victims and ran into need_resched().  We leave
>> > d_walk() and call shrink_dentry_list().  At that point there's a lot
>> > of stuff on our shrink list and anybody else running into them will
>> > have to keep scanning.  Giving up the timeslice before we take care
>> > of any of those looks like a bad idea, to put it mildly, and that's
>> > precisely what will happen.
>> >
>> > What about doing that in the end of __dentry_kill() instead?  And to
>> > hell with both existing call sites - dput() one (before going to
>> > the parent) is obviously covered by that (dentry_kill() only returns
>> > non-NULL after having called __dentry_kill()) and in shrink_dentry_list()
>> > we'll get to it as soon as we go through all dentries that can be
>> > immediately kicked off the shrink list.  Which, AFAICS, improves the
>> > situation, now that shrink_lock_dentry() contains no trylock loops...
>> >
>> > Comments?
>>
>> What I mean is something like this (cumulative diff, it'll obviously need
>> to be carved up into 3--4 commits):
>
> ... and carved-up version is in vfs.git#work.dcache.  Could syzbot folks
> hit it with their reproducers?

To me it seems like shrink_dcache_parent could still spin without
scheduling similar to before - found > 0 since *someone* is shrinking,
but we have 0 entries to shrink - we never call __dentry_kill or
schedule, and just spin.

And indeed, the syzbot reproducer @vfs#work.dcache hits a softlockup
in shrink_dcache_parent.

I'd think re-adding cond_resched to shrink_dcache_parent does make the
softlockup go way. It seems to fix the reproducer.
Although did we want to terminate the loop in shrink_dcache_parent earlier?
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature