On Wed 22-11-23 19:36:52, Al Viro wrote: > ... and fix the directory locking documentation and proof of correctness. > Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the > case where we really don't want it is splicing the root of disconnected > tree to somewhere. > > In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an > ancestor of Y" only if X and Y are already in the same tree. Otherwise > it can go from false to true, and one can construct a deadlock on that. > > Make lock_two_directories() report an error in such case and update the > callers of lock_rename()/lock_rename_child() to handle such errors. > > And yes, such conditions are not impossible to create ;-/ > > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Looks good to me. Just one nit below but whether you decide to address it or not, feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> > +// p1 != p2, both are on the same filesystem, ->s_vfs_rename_mutex is held > static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2) > { > - struct dentry *p; > + struct dentry *p = p1, *q = p2, *r; > > - p = d_ancestor(p2, p1); > - if (p) { > + while ((r = p->d_parent) != p2 && r != p) > + p = r; > + if (r == p2) { > + // p is a child of p2 and an ancestor of p1 or p1 itself > inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); > inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2); > return p; > } > - > - p = d_ancestor(p1, p2); > - inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); > - inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); > - return p; > + // p is the root of connected component that contains p1 > + // p2 does not occur on the path from p to p1 > + while ((r = q->d_parent) != p1 && r != p && r != q) > + q = r; > + if (r == p1) { > + // q is a child of p1 and an ancestor of p2 or p2 itself > + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); > + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); > + return q; > + } else if (likely(r == p)) { > + // both p2 and p1 are descendents of p > + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); > + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); > + return NULL; > + } else { // no common ancestor at the time we'd been called > + mutex_unlock(&p1->d_sb->s_vfs_rename_mutex); It would look more natural to me if s_vfs_rename_mutex got dropped in the callers (lock_rename(), lock_rename_child()) which have acquired the lock instead of here. I agree it results in a bit more boiler plate code though. > + return ERR_PTR(-EXDEV); > + } > } Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR