Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

Nikolay Borisov <nborisov@xxxxxxxx> · Sat, 14 Apr 2018 10:00:29 +0300

On 14.04.2018 00:14, Andrew Morton wrote:
> On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote:
> 
>> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
>> In this case we may have 0 dentries to dispose, so we will never
>> schedule out while waiting for the parallel shrink_dentry_list to
>> complete.
>>
>> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> 
> Well I guess the patch is OK as a stopgap, but things seem fairly
> messed up in there.  shrink_dcache_parent() shouldn't be doing a
> busywait, waiting for the concurrent shrink_dentry_list().
> 
> Either we should be waiting (sleeping) for the concurrent operation to
> complete or we should just bail out of shrink_dcache_parent(), perhaps
> with 
> 
> 	if (list_empty(&data.dispose))
> 		break;
> 
> or similar.  Dunno.

I agree, however, not being a dcache expert I'd refrain from touching
it, since it seems to be rather fragile. Perhaps Al could take a look in
there?

> 
> 
> That block comment over `struct select_data' is not a good one.  "It
> returns zero iff...".  *What* returns zero?  select_collect()?  No it
> doesn't, it returns an `enum d_walk_ret'.  Perhaps the comment is
> trying to refer to select_data.found.  And the real interpretation of
> select_data.found is, umm, hard to describe.  "Counts the number of
> dentries which are on a shrink list or which were moved to the dispose
> list".  Why?  What's that all about?
> 
> This code needs a bit of thought, documentation and perhaps a redo,
> I suspect.
>