On Fri, Oct 28, 2022 at 3:10 AM Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> wrote: > > With very large d_subdirs lists, iteration can take a long time. Since > iteration needs to hold parent->d_lock, this can trigger soft lockups. > It would be best to make this iteration sleepable. Since we have the > inode locked exclusive, we can drop the parent->d_lock and sleep, > holding a reference to a child dentry, and continue iteration once we > wake. > > Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> > --- > Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> some comment nits and one fortify suggestion > Notes: > v3: > - removed if statements around dput() > v2: > - added a check for child->d_parent != alias and retry logic > > fs/notify/fsnotify.c | 36 ++++++++++++++++++++++++++++++++---- > 1 file changed, 32 insertions(+), 4 deletions(-) > > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > index ccb8a3a6c522..34e5d18235a7 100644 > --- a/fs/notify/fsnotify.c > +++ b/fs/notify/fsnotify.c > @@ -102,10 +102,12 @@ void fsnotify_sb_delete(struct super_block *sb) > * on a child we run all of our children and set a dentry flag saying that the > * parent cares. Thus when an event happens on a child it can quickly tell > * if there is a need to find a parent and send the event to the parent. > + * > + * Context: inode locked exclusive > */ > static bool __fsnotify_update_children_dentry_flags(struct inode *inode) > { > - struct dentry *alias, *child; > + struct dentry *child, *alias, *last_ref = NULL; > int watched; > > if (!S_ISDIR(inode->i_mode)) > @@ -120,12 +122,37 @@ static bool __fsnotify_update_children_dentry_flags(struct inode *inode) > alias = d_find_any_alias(inode); > > /* > - * run all of the children of the original inode and fix their > - * d_flags to indicate parental interest (their parent is the > - * original inode) > + * These lists can get very long, so we may need to sleep during > + * iteration. Normally this would be impossible without a cursor, > + * but since we have the inode locked exclusive, we're guaranteed > + * that the directory won't be modified, so whichever dentry we > + * pick to sleep on won't get moved. So, start a manual iteration > + * over d_subdirs which will allow us to sleep. > */ > spin_lock(&alias->d_lock); > +retry: > list_for_each_entry(child, &alias->d_subdirs, d_child) { > + if (need_resched()) { > + /* > + * We need to hold a reference while we sleep. But when > + * we wake, dput() could free the dentry, invalidating > + * the list pointers. We can't look at the list pointers > + * until we re-lock the parent, and we can't dput() once > + * we have the parent locked. So the solution is to hold > + * onto our reference and free it the *next* time we drop > + * alias->d_lock: either at the end of the function, or > + * at the time of the next sleep. > + */ My personal preference would be to move this above if (needed_reschd()) it is not any less clear when this comment is above the condition and less indented will read nicer. > + dget(child); > + spin_unlock(&alias->d_lock); > + dput(last_ref); > + last_ref = child; > + cond_resched(); > + spin_lock(&alias->d_lock); > + if (child->d_parent != alias) > + goto retry; Is this expected? If not, then we need a WARN_ON_ONCE(). Also, I wonder if it would be better to break out and leave dentry flags as they are instead of risking some endless or very long retry loop? And how about asserting on unexpected !list_empty(&child->d_child) to avoid an endless loop in list_for_each_entry()? Thanks, Amir.