With very large d_subdirs lists, iteration can take a long time. Since iteration needs to hold alias->d_lock, this can trigger soft lockups. It would be best to make this iteration sleepable. We can drop alias->d_lock and sleep, by taking a reference to the current child. However, we need to be careful, since it's possible for the child's list pointers to be modified once we drop alias->d_lock. The following are the only cases where the list pointers are modified: 1. dentry_unlist() in fs/dcache.c This is a helper of dentry_kill(). This function is quite careful to check the reference count of the dentry once it has taken the requisite locks, and bail out if a new reference was taken. So we can be safe that, assuming we found the dentry and took a reference before dropping alias->d_lock, any concurrent dentry_kill() should bail out and leave our list pointers untouched. 2. __d_move() in fs/dcache.c If the child was moved to a new parent, then we can detect this by testing d_parent and retrying the iteration. 3. Initialization code in d_alloc() family We are safe from this code, since we cannot encounter a dentry until it has been initialized. 4. Cursor code in fs/libfs.c for dcache_readdir() Dentries with DCACHE_DENTRY_CURSOR should be skipped before sleeping, since we could awaken to find they have skipped over a portion of the child list. Given these considerations, we can carefully write a loop that iterates over d_subdirs and is capable of going to sleep periodically. Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> --- Notes: v4: - I've updated this patch so it should be safe even without the inode locked, by handling cursors and d_move() races. - Moved comments to lower indentation - I didn't keep Amir's R-b since this was a fair bit of change. v3: - removed if statements around dput() v2: - added a check for child->d_parent != alias and retry logic fs/notify/fsnotify.c | 46 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 409d479cbbc6..0ba61211456c 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -105,7 +105,7 @@ void fsnotify_sb_delete(struct super_block *sb) */ void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched) { - struct dentry *alias, *child; + struct dentry *alias, *child, *last_ref = NULL; if (!S_ISDIR(inode->i_mode)) return; @@ -116,12 +116,49 @@ void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched) return; /* - * run all of the children of the original inode and fix their - * d_flags to indicate parental interest (their parent is the - * original inode) + * These lists can get very long, so we may need to sleep during + * iteration. Normally this would be impossible without a cursor, + * but since we have the inode locked exclusive, we're guaranteed + * that the directory won't be modified, so whichever dentry we + * pick to sleep on won't get moved. So, start a manual iteration + * over d_subdirs which will allow us to sleep. */ spin_lock(&alias->d_lock); +retry: list_for_each_entry(child, &alias->d_subdirs, d_child) { + /* + * We need to hold a reference while we sleep. But we cannot + * sleep holding a reference to a cursor, or we risk skipping + * through the list. + * + * When we wake, dput() could free the dentry, invalidating the + * list pointers. We can't look at the list pointers until we + * re-lock the parent, and we can't dput() once we have the + * parent locked. So the solution is to hold onto our reference + * and free it the *next* time we drop alias->d_lock: either at + * the end of the function, or at the time of the next sleep. + */ + if (child->d_flags & DCACHE_DENTRY_CURSOR) + continue; + if (need_resched()) { + dget(child); + spin_unlock(&alias->d_lock); + dput(last_ref); + last_ref = child; + cond_resched(); + spin_lock(&alias->d_lock); + /* Check for races with __d_move() */ + if (child->d_parent != alias) + goto retry; + } + + /* + * This check is after the cond_resched() for a reason: it is + * safe to sleep holding a negative dentry reference. If the + * directory contained many negative dentries, and we skipped + * them checking need_resched(), we may never end up sleeping + * where necessary, and could trigger a soft lockup. + */ if (!child->d_inode) continue; @@ -133,6 +170,7 @@ void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched) spin_unlock(&child->d_lock); } spin_unlock(&alias->d_lock); + dput(last_ref); dput(alias); } -- 2.34.1