On 14.04.2018 00:14, Andrew Morton wrote: > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote: > >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. >> In this case we may have 0 dentries to dispose, so we will never >> schedule out while waiting for the parallel shrink_dentry_list to >> complete. >> >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > Well I guess the patch is OK as a stopgap, but things seem fairly > messed up in there. shrink_dcache_parent() shouldn't be doing a > busywait, waiting for the concurrent shrink_dentry_list(). > > Either we should be waiting (sleeping) for the concurrent operation to > complete or we should just bail out of shrink_dcache_parent(), perhaps > with > > if (list_empty(&data.dispose)) > break; > > or similar. Dunno. I agree, however, not being a dcache expert I'd refrain from touching it, since it seems to be rather fragile. Perhaps Al could take a look in there? > > > That block comment over `struct select_data' is not a good one. "It > returns zero iff...". *What* returns zero? select_collect()? No it > doesn't, it returns an `enum d_walk_ret'. Perhaps the comment is > trying to refer to select_data.found. And the real interpretation of > select_data.found is, umm, hard to describe. "Counts the number of > dentries which are on a shrink list or which were moved to the dispose > list". Why? What's that all about? > > This code needs a bit of thought, documentation and perhaps a redo, > I suspect. >