On Thu, Mar 10, 2016 at 09:22:21PM +0000, Drokin, Oleg wrote: > In fact I was surprised this was not the case already and so people > needed to call it by hand in every possible case where a new dentry is possibly > originated leading to a bit of a mess (and some bugs). Most of the filesystems do not need that at all - just look at the d_splice_alias() callers. In mainline we have * 22 callers of form return d_splice_alias(inode, dentry); in some ->lookup() instance. * 1 caller in d_add_ci(), all callers of which are in ->lookup() instances, with return value directly returned to caller of ->lookup(). * 1 caller in kernfs ->lookup(), with return value directly returned after releasing a global mutex. * ceph - returned in a very convoluted way, but return values is not dereferenced until it gets passed to caller of ->lookup() * gfs2 - passed to caller (or fed to finish_open() when it's in atomic open) with some unlocks along the way. * ocfs2 - broken, and in a way that wouldn't be fixable at dentry allocation time. * cifs - seeding dcache on readdir. No postprocessing of dentry. fs/9p/vfs_inode.c:834: res = d_splice_alias(inode, dentry); * fuse - essentially touching a timestamp of dentry, be it new or old one. Both on ->lookup() and when seeding dcache on readdir. Same as done on successful revalidate, so the worst we are risking is extra work in ->d_revalidate() for somebody finding it in dcache before we'd touched the timestamp. * nfs - ditto. * 9p - the fid we'd obtained is hung on whatever dentry we get. If revalidate doesn't find a fid, it'll try to get one from server and hang it on the dentry. d_obtain_alias() callers make it even more clear - 27 call sites where it's directly returned, 5 more where it's returned without being dereferenced and 5 callers in 3 filesystems (ceph, fuse, lustre) where something is (or ought to be) done. ceph might benefit from that thing as well - looks like its users of d_obtain_alias() might be racy wrt d_splice_alias() from ->lookup() picking the alias before ceph_init_dentry() finishes. Or not - the call chain in there is convoluted as hell and I might miss the call of ceph_init_denty() along the way. ocfs2 might or might not be correct - it doesn't do any postprocessing after d_obtain_alias(); no idea if it really needs one there. So we have two filesystems that could benefit from having ->d_init() - ceph and lustre. Note that e.g. ocfs2 wouldn't benefit from that - it needs a lot more than just allocation *and* it uses ->d_fsdata differently for positives and negatives. FWIW, there's a troubling thing in ceph_init_dentry() - if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_NOSNAP) d_set_d_op(dentry, &ceph_dentry_ops); else if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_SNAPDIR) d_set_d_op(dentry, &ceph_snapdir_dentry_ops); else d_set_d_op(dentry, &ceph_snap_dentry_ops); _That_ can't be done in __d_alloc() and, what's worse, it doesn't look good in their __fh_to_dentry() call of ceph_init_dentry(). ->d_parent has a good chance to be pointing to the dentry itself. Not sure how much trouble does it lead to - I'm not familiar with ceph snapshots or ceph reexports over NFS, let alone the combination... > The downside is that we do the allocation at all times even if the dentry is > going to be promptly destroyed because we found a better alias. The cost of kmalloc() (or kmem_cache_alloc() - it might make sense to introduce a specialized slab for those beasts) is trivial compared to the cost of talking to server to find which inode you are going to deal with. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html