On Mon 17-02-25 16:30:03, NeilBrown wrote: > Some filesystems, such as NFS, cifs, ceph, and fuse, do not have > complete control of sequencing on the actual filesystem (e.g. on a > different server) and may find that the inode created for a mkdir > request already exists in the icache and dcache by the time the mkdir > request returns. For example, if the filesystem is mounted twice the > file could be visible on the other mount before it is on the original > mount, and a pair of name_to_handle_at(), open_by_handle_at() could > instantiate the directory inode with an IS_ROOT() dentry before the > first mkdir returns. > > This means that the dentry passed to ->mkdir() may not be the one that > is associated with the inode after the ->mkdir() completes. Some > callers need to interact with the inode after the ->mkdir completes and > they currently need to perform a lookup in the (rare) case that the > dentry is no longer hashed. > > This lookup-after-mkdir requires that the directory remains locked to > avoid races. Planned future patches to lock the dentry rather than the > directory will mean that this lookup cannot be performed atomically with > the mkdir. > > To remove this barrier, this patch changes ->mkdir to return the > resulting dentry if it is different from the one passed in. > Possible returns are: > NULL - the directory was created and no other dentry was used > ERR_PTR() - an error occurred > non-NULL - this other dentry was spliced in > > Not all filesystems reliable result in a positive hashed dentry: > > - NFS does produce the proper dentry, but does not yet return it. The > code change is larger than I wanted to include in this patch > - cifs will, when posix extensions are enabled, unhash the > dentry on success so a subsequent lookup will create it if needed. > - cifs without posix extensions will unhash the dentry if an > internal lookup finds a non-directory where it expected the dir. > - kernfs and tracefs leave the dentry negative and the ->revalidate > operation ensures that lookup will be called to correctly populate > the dentry > - hostfs leaves the dentry negative and uses > .d_delete = always_delete_dentry > so the negative dentry is quickly discarded and a lookup will add a > new entry. > > Signed-off-by: NeilBrown <neilb@xxxxxxx> Looks good to me. I've spotted just a few style nits: > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst > index 31eea688609a..bd3751dfddcf 100644 > --- a/Documentation/filesystems/vfs.rst > +++ b/Documentation/filesystems/vfs.rst > @@ -562,7 +562,10 @@ otherwise noted. > ``mkdir`` > called by the mkdir(2) system call. Only required if you want > to support creating subdirectories. You will probably need to > - call d_instantiate() just as you would in the create() method > + call d_instantiate() just as you would in the create() method. > + If some dentry other than the one given is spliced in, then it > + much be returned, otherwise NULL is returned is the original dentry ^^^^ must > + is successfully used, else an ERR_PTR() is returned. > > ``rmdir`` > called by the rmdir(2) system call. Only required if you want ... > @@ -918,6 +922,7 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir, > args.in_args[1].size = entry->d_name.len + 1; > args.in_args[1].value = entry->d_name.name; > return create_new_entry(idmap, fm, &args, dir, entry, S_IFDIR); > + Bogus empty line here. > } > > static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir, ... > @@ -4313,10 +4314,17 @@ int vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir, > if (max_links && dir->i_nlink >= max_links) > return -EMLINK; > > - error = dir->i_op->mkdir(idmap, dir, dentry, mode); > - if (!error) > + de = dir->i_op->mkdir(idmap, dir, dentry, mode); > + if (IS_ERR(de)) > + return PTR_ERR(de); > + if (de) { > + fsnotify_mkdir(dir, de); > + /* Cannot return de yet */ > + dput(de); > + } else > fsnotify_mkdir(dir, dentry); The prefered style is: } else { fsnotify_mkdir(dir, dentry); } Otherwise the patch looks good to me at least for the VFS bits and the filesystems I understand like ext2, ext4, ocfs2, udf. So feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR