On Fri, Aug 26, 2022 at 12:10:43PM +1000, NeilBrown wrote: > +/** > + * d_lock_update_nested - lock a dentry before updating > + * @dentry: the dentry to be locked > + * @base: the parent, or %NULL > + * @name: the name in that parent, or %NULL > + * @subclass: lockdep locking class. > + * > + * Lock a dentry in a directory on which a shared-lock may be held, and > + * on which parallel updates are permitted. > + * If the base and name are given, then on success the dentry will still > + * have that base and name - it will not have raced with rename. > + * On success, a positive dentry will still be hashed, ensuring there > + * was no race with unlink. > + * If they are not given, then on success the dentry will be negative, > + * which again ensures no race with rename, or unlink. I'm not sure it's a good idea to have that in one function, TBH. Looking at the callers, there are * lookup_hash_update() lookup_hash_update_len() nfsd shite filename_create_one() filename_create_one_len() nfsd shite filename_create() kern_path_create() user_path_create() do_mknodat() do_mkdirat() do_symlinkat() do_linkat() do_rmdir() do_unlinkat() * fuckloads of callers in lock_rename_lookup() * d_lock_update() atomic_open() lookup_open() Only the last two can get NULL base or name. Already interesting, isn't it? What's more, splitup between O_CREATE open() on one side and everything else that might create, remove or rename on the other looks really weird. > + rcu_read_lock(); /* for d_same_name() */ > + if (d_unhashed(dentry) && d_is_positive(dentry)) { > + /* name was unlinked while we waited */ > + ret = false; BTW, what happens if somebody has ->lookup() returning a positive unhashed? Livelock on attempt to hit it with any directory-modifying syscall? Right now such behaviour is permitted; I don't know if anything in the tree actually does it, but at the very least it would need to be documented. Note that *negative* unhashed is not just permitted, it's actively used e.g. by autofs. That really needs to be well commented... > + } else if (base && > + (dentry->d_parent != base || > + dentry->d_name.hash != name->hash || > + !d_same_name(dentry, base, name))) { > + /* dentry was renamed - possibly silly-rename */ > + ret = false; > + } else if (!base && d_is_positive(dentry)) { > + ret = false; > + } else { > + dentry->d_flags |= DCACHE_PAR_UPDATE; > + } > + * Parent directory has inode locked exclusive, or possibly shared if wq > + * is given. In the later case the fs has explicitly allowed concurrent > + * updates in this directory. This is the one and only case > + * when ->lookup() may be called on a non in-lookup dentry. What Linus already said about wq... To add a reason he hadn't mentioned, the length of call chain one needs to track to figure out whether it's NULL or not is... excessive. And I don't mean just "greater than 0". We have places like that, and sometimes we have to, but it's never a good thing... > static struct dentry *__lookup_hash(const struct qstr *name, > - struct dentry *base, unsigned int flags) > + struct dentry *base, unsigned int flags, > + wait_queue_head_t *wq) > - dentry = d_alloc(base, name); > + if (wq) > + dentry = d_alloc_parallel(base, name, wq); > + else > + dentry = d_alloc(base, name); > if (unlikely(!dentry)) > return ERR_PTR(-ENOMEM); > + if (IS_ERR(dentry)) > + return dentry; BTW, considering the fact that we have 12 callers of d_alloc() (including this one) and 28 callers of its wrapper (d_alloc_name()), I would seriously consider converting d_alloc() from "NULL or new dentry" to "ERR_PTR(-ENOMEM) or new dentry". Especially since quite a few of those callers will be happier that way. Grep and you'll see... As a side benefit, if (unlikely(!dentry)) turns into if (IS_ERR(dentry)). > +static struct dentry *lookup_hash_update( > + const struct qstr *name, > + struct dentry *base, unsigned int flags, > + wait_queue_head_t *wq) > +{ > + struct dentry *dentry; > + struct inode *dir = base->d_inode; > + int err; > + > + if (wq && IS_PAR_UPDATE(dir)) > + inode_lock_shared_nested(dir, I_MUTEX_PARENT); > + else > + inode_lock_nested(dir, I_MUTEX_PARENT); > + > +retry: > + dentry = __lookup_hash(name, base, flags, wq); > + if (IS_ERR(dentry)) { > + err = PTR_ERR(dentry); > + goto out_err; > + } > + if (!d_lock_update_nested(dentry, base, name, I_MUTEX_PARENT)) { > + /* > + * Failed to get lock due to race with unlink or rename > + * - try again > + */ > + d_lookup_done(dentry); When would we get out of __lookup_hash() with in-lookup dentry? Confused... > +struct dentry *lookup_hash_update_len(const char *name, int nlen, > + struct path *path, unsigned int flags, const struct path *, damnit... > + wait_queue_head_t *wq) > +{ > + struct qstr this; > + int err = lookup_one_common(mnt_user_ns(path->mnt), name, > + path->dentry, nlen, &this); > + if (err) > + return ERR_PTR(err); > + return lookup_hash_update(&this, path->dentry, flags, wq); > +} > +EXPORT_SYMBOL(lookup_hash_update_len); Frankly, the calling conventions of the "..._one_len" family is something I've kept regretting for a long time. Oh, well... > +static void done_path_update(struct path *path, struct dentry *dentry, > + bool with_wq) > +{ > + struct inode *dir = path->dentry->d_inode; > + > + d_lookup_done(dentry); > + d_unlock_update(dentry); > + if (IS_PAR_UPDATE(dir) && with_wq) > + inode_unlock_shared(dir); > + else > + inode_unlock(dir); > +} const struct path *, again... > @@ -3400,6 +3499,12 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file, > dentry = res; > } > } > + /* > + * If dentry is negative and this is a create we need to get > + * DCACHE_PAR_UPDATE. > + */ > + if ((open_flag & O_CREAT) && !dentry->d_inode) > + have_par_update = d_lock_update(dentry, NULL, NULL); > > /* Negative dentry, just create the file */ > if (!dentry->d_inode && (open_flag & O_CREAT)) { Fold the above here, please. What's more, losing the flag would've made it much easier to follow... > @@ -3419,9 +3524,13 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file, > error = create_error; > goto out_dput; > } > + if (have_par_update) > + d_unlock_update(dentry); > return dentry; > > out_dput: > + if (have_par_update) > + d_unlock_update(dentry); > @@ -3821,27 +3962,28 @@ struct dentry *kern_path_create(int dfd, const char *pathname, > struct path *path, unsigned int lookup_flags) BTW, there's 9 callers of that sucker in the entire tree, along with 3 callers of user_path_create() and 14 callers of done_path_create(). Not a big deal to add the wq in those, especially since it can be easily split into preparatory patch (with wq passed, but being unused). > -void done_path_create(struct path *path, struct dentry *dentry) > +void done_path_create_wq(struct path *path, struct dentry *dentry, bool with_wq) Why "with_wq" and not the wq itself? > - * The caller must hold dir->i_mutex. > + * The caller must either hold a write-lock on dir->i_rwsem, or > + * a have atomically set DCACHE_PAR_UPDATE, or both. ??? > + * If the filesystem permits (IS_PAR_UPDATE()), we take a shared lock on the > + * directory and set DCACHE_PAR_UPDATE to get exclusive access to the dentry. The latter happens unconditionally here, unless I'm misreading the code (as well as your cover letter). It does *NOT* happen on rename(), though, contrary to the same. And while your later patch adds it in lock_rename_lookup(), existing lock_rename() callers do not get that at all. Likely to be a problem... > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -13,7 +13,9 @@ > #include <linux/rcupdate.h> > #include <linux/lockref.h> > #include <linux/stringhash.h> > +#include <linux/sched.h> Bloody hell, man... > +static inline void d_unlock_update(struct dentry *dentry) > +{ > + if (IS_ERR_OR_NULL(dentry)) > + return; Do explain... When could we ever get NULL or ERR_PTR() passed to that? BTW, I would seriously look into splitting the "let's add a helper that combines locking parent with __lookup_hash()" into a preliminary patch. Would be easier to follow.