One of the biggest remaining unnecessary costs in path walking is the pointer chasing in inode operations. We already avoided the dentry->d_op derferences with the DCACHE_OP_xyz flags, this just starts doing the same thing for the i_op->xyz cases. Maybe-signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- This is a cut-down version of the earlier PATCH 1/2, forward-ported to the current vfs state. It's tested, although I don't claim any serious amount of stressing of it. And I changed it so that those DCACHE_OP_{LOOKUP|FOLLOW_LINK} bits are only really valid when there is an inode at all. So the bits don't get invalidated when turning the dentry negative, they only get re-validated when the entry is instantiated again. If something sets dentry->d_inode without going through __d_instantiate, that misses it. I'm looking at d_obtain_alias(), and wondering, for example. This is a very minimal version of the previous patch: those *two* places where we now test the new DCACHE_OP_xyz bits are where we really see the dereferencing of the inode->i_op in the normal path walking. HOWEVER, the "inode_permission()" function also shows up pretty clearly on profiles, and the hottest part of that is the testing of "inode->i_op->permission". However, with Al's other changes, the old special-case "exec_permission()" no longer exists. The whole reason it existed before was that the execute permission checking on directory entries is/was special from a performance standpoint. So it might make sense to resurrect it, and then add the DCACHE_OP_PERMISSION case like in my older patch. Anyway, it's not pretty, but that double indirect into a cold-cache "inode->i_op->xyz" really does show up. So consider this an RFC. The "move ACL to generic" code was a serious patch, this is more a fodder for discussion. Linus --- fs/dcache.c | 10 +++++++++- fs/namei.c | 4 ++-- include/linux/dcache.h | 6 ++++++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index be18598c7fd7..3f9d1480f349 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1303,12 +1303,20 @@ EXPORT_SYMBOL(d_set_d_op); static void __d_instantiate(struct dentry *dentry, struct inode *inode) { + unsigned int flags; + spin_lock(&dentry->d_lock); + flags = dentry->d_flags & ~DCACHE_OP_INODE; if (inode) { if (unlikely(IS_AUTOMOUNT(inode))) - dentry->d_flags |= DCACHE_NEED_AUTOMOUNT; + flags |= DCACHE_NEED_AUTOMOUNT; list_add(&dentry->d_alias, &inode->i_dentry); + if (inode->i_op->lookup) + flags |= DCACHE_OP_LOOKUP; + if (inode->i_op->follow_link) + flags |= DCACHE_OP_FOLLOW_LINK; } + dentry->d_flags = flags; dentry->d_inode = inode; dentry_rcuwalk_barrier(dentry); spin_unlock(&dentry->d_lock); diff --git a/fs/namei.c b/fs/namei.c index 120efc76d3d0..8c0955c3a4b2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1262,7 +1262,7 @@ static inline int walk_component(struct nameidata *nd, struct path *path, terminate_walk(nd); return -ENOENT; } - if (unlikely(inode->i_op->follow_link) && follow) { + if (unlikely(path->dentry->d_flags & DCACHE_OP_FOLLOW_LINK) && follow) { if (nd->flags & LOOKUP_RCU) { if (unlikely(unlazy_walk(nd, path->dentry))) { terminate_walk(nd); @@ -1394,7 +1394,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) return err; } err = -ENOTDIR; - if (!nd->inode->i_op->lookup) + if (!(nd->path.dentry->d_flags & DCACHE_OP_LOOKUP)) break; continue; /* here ends the main loop */ diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 3f22d8d6d8a3..c719566c4088 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -208,6 +208,12 @@ struct dentry_operations { #define DCACHE_CANT_MOUNT 0x0100 #define DCACHE_GENOCIDE 0x0200 +/* Avoid dereferencing inode->i_op */ +#define DCACHE_OP_LOOKUP 0x0400 +#define DCACHE_OP_FOLLOW_LINK 0x0800 +#define DCACHE_OP_INODE (DCACHE_OP_LOOKUP | DCACHE_OP_FOLLOW_LINK) + +/* Avoid dereferencing dentry->d_op */ #define DCACHE_OP_HASH 0x1000 #define DCACHE_OP_COMPARE 0x2000 #define DCACHE_OP_REVALIDATE 0x4000 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html