Hi, > -----Original Message----- > From: Al Viro <viro@xxxxxxxxxxxxxxxx> On Behalf Of Al Viro > Sent: 2019年9月6日 1:48 > To: zhengbin (A) <zhengbin13@xxxxxxxxxx> > Cc: jack@xxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; zhangyi > (F) <yi.zhang@xxxxxxxxxx>; renxudong1@xxxxxxxxxx; Jun Li <jun.li@xxxxxxx> > Subject: Re: Possible FS race condition between iterate_dir and d_alloc_parallel > > On Wed, Sep 04, 2019 at 02:15:58PM +0800, zhengbin (A) wrote: > > >> > > >> Confused... OTOH, I might be misreading that table of yours - it's > > >> about 30% wider than the widest xterm I can get while still being > > >> able to read the font... > > > > The table is my guess. This oops happens sometimes > > > > (We have one vmcore, others just have log, and the backtrace is same with vmcore, so > the reason should be same). > > > > Unfortunately, we do not know how to reproduce it. The vmcore has such a law: > > > > 1、dirA has 177 files, and it is OK > > > > 2、dirB has 25 files, and it is OK > > > > 3、When we ls dirA, it begins with ".", "..", dirB's first file, second > > file... last file, last file->next = &(dirB->d_subdirs) > > Hmm... Now, that is interesting. I'm not sure it has anything to do with that bug, but > lockless loops over d_subdirs can run into trouble. > > Look: dentry_unlist() leaves the ->d_child.next pointing to the next non-cursor list element > (or parent's ->d_subdir, if there's nothing else left). It works in pair with d_walk(): there we > have > struct dentry *child = this_parent; > this_parent = child->d_parent; > > spin_unlock(&child->d_lock); > spin_lock(&this_parent->d_lock); > > /* might go back up the wrong parent if we have had a rename. */ > if (need_seqretry(&rename_lock, seq)) > goto rename_retry; > /* go into the first sibling still alive */ > do { > next = child->d_child.next; > if (next == &this_parent->d_subdirs) > goto ascend; > child = list_entry(next, struct dentry, d_child); > } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED)); > rcu_read_unlock(); > > Note the recheck of rename_lock there - it does guarantee that even if child has been > killed off between unlocking it and locking this_parent, whatever it has ended up with in its > ->d_child->next has *not* been moved elsewhere. It might, in turn, have been killed off. > In that case its ->d_child.next points to the next surviving non-cursor, also guaranteed to > remain in the same directory, etc. > > However, lose that rename_lock recheck and we'd get screwed, unless there's some > other d_move() prevention in effect. > > Note that all libfs.c users (next_positive(), move_cursor(), dcache_dir_lseek(), > dcache_readdir(), simple_empty()) should be safe - dcache_readdir() is called with > directory locked at least shared, uses in dcache_dir_lseek() are surrounded by the same, > move_cursor() and simple_empty() hold ->d_lock on parent, > next_positive() is called only under the lock on directory's inode (at least shared). Any of > those should prevent any kind of cross-directory moves - both into and out of. > > <greps for d_subdirs/d_child users> > > Huh? > In drivers/usb/typec/tcpm/tcpm.c: > static void tcpm_debugfs_exit(struct tcpm_port *port) { > int i; > > mutex_lock(&port->logbuffer_lock); > for (i = 0; i < LOG_BUFFER_ENTRIES; i++) { > kfree(port->logbuffer[i]); > port->logbuffer[i] = NULL; > } > mutex_unlock(&port->logbuffer_lock); > > debugfs_remove(port->dentry); > if (list_empty(&rootdir->d_subdirs)) { > debugfs_remove(rootdir); > rootdir = NULL; > } > } > > Unrelated, but obviously broken. Not only the locking is deeply suspect, but it's trivially > confused by open() on the damn directory. It will definitely have ->d_subdirs non-empty. > > Came in "usb: typec: tcpm: remove tcpm dir if no children", author Cc'd... Why not > remove the directory on rmmod? That's because tcpm is a utility driver and there may be multiple instances created under the directory, each instance/user removal will call to tcpm_debugfs_exit() but only the last one should remove the directory. Below patch changed this by using dedicated dir for each instance: https://www.spinics.net/lists/linux-usb/msg183965.html Li Jun > And create on insmod, initially empty... > > fs/nfsd/nfsctl.c: > static void nfsdfs_remove_files(struct dentry *root) { > struct dentry *dentry, *tmp; > > list_for_each_entry_safe(dentry, tmp, &root->d_subdirs, d_child) { > if (!simple_positive(dentry)) { > WARN_ON_ONCE(1); /* I think this can't happen? */ > continue; > It can happen - again, just have it opened and it bloody well will. > Locking is OK, though - parent's inode is locked, so we are safe from d_move() playing > silly buggers there. > > fs/autofs/root.c: > static void autofs_clear_leaf_automount_flags(struct dentry *dentry) { ... > /* Set parent managed if it's becoming empty */ > if (d_child->next == &parent->d_subdirs && > d_child->prev == &parent->d_subdirs) > managed_dentry_set_managed(parent); > > Same bogosity regarding the check for emptiness (that one might've been my fault). > Locking is safe... Not sure if all places in autofs/expire.c are careful enough... > > So it doesn't look like this theory holds. Which filesystem had that been on and what > about ->d_parent of dentries in dirA and dirB > ->d_subdirs?