Today, when opening a file we'll typically do a fast lookup, but if O_CREAT is set, the kernel always takes the exclusive inode lock. I assume this was done with the expectation that O_CREAT means that we always expect to do the create, but that's often not the case. Many programs set O_CREAT even in scenarios where the file already exists. This patch rearranges the pathwalk-for-open code to also attempt a fast_lookup in certain O_CREAT cases. If a positive dentry is found, the inode_lock can be avoided altogether, and if auditing isn't enabled, it can stay in rcuwalk mode for the last step_into. One notable exception that is hopefully temporary: if we're doing an rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the dentry in that case is more expensive than taking the i_rwsem for now. Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- Here's a revised patch that does a fast_lookup in the O_CREAT codepath too. The main difference here is that if a positive dentry is found and audit_dummy_context is true, then we keep the walk lazy for the last component, which avoids having to take any locks on the parent (just like with non-O_CREAT opens). Mateusz wrote a will-it-scale test that does an O_CREAT open and close in the same directory repeatedly. Running that in 70 different processes: v6.10: 754565 v6.10+patch: 25747851 ...which is roughly a 34x speedup. I also ran the unlink1 test in single process mode to try and gauge how bad the performance impact would be in the case where we always have to search, not find anything and do the create: v6.10: 200106 v6.10+patch: 199188 ~0.4% performance hit in that test. I'm not sure that's statistically significant, but we should keep an eye out for slowdowns in these sorts of workloads if we decide to take this. --- Changes in v3: - Check for IS_ERR in lookup_fast result - Future-proof open_last_lookups to handle case where lookup_fast_for_open returns a positive dentry while auditing is enabled - Link to v2: https://lore.kernel.org/r/20240806-openfast-v2-1-42da45981811@xxxxxxxxxx Changes in v2: - drop the lockref patch since Mateusz is working on a better approach - add trailing_slashes helper function - add a lookup_fast_for_open helper function - make lookup_fast_for_open skip the lookup if auditing is enabled - if we find a positive dentry and auditing is disabled, don't unlazy - Link to v1: https://lore.kernel.org/r/20240802-openfast-v1-0-a1cff2a33063@xxxxxxxxxx --- fs/namei.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 1e05a0f3f04d..7894fafa8e71 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3518,6 +3518,49 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file, return ERR_PTR(error); } +static inline bool trailing_slashes(struct nameidata *nd) +{ + return (bool)nd->last.name[nd->last.len]; +} + +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag) +{ + struct dentry *dentry; + + if (open_flag & O_CREAT) { + /* Don't bother on an O_EXCL create */ + if (open_flag & O_EXCL) + return NULL; + + /* + * FIXME: If auditing is enabled, then we'll have to unlazy to + * use the dentry. For now, don't do this, since it shifts + * contention from parent's i_rwsem to its d_lockref spinlock. + * Reconsider this once dentry refcounting handles heavy + * contention better. + */ + if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context()) + return NULL; + } + + if (trailing_slashes(nd)) + nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY; + + dentry = lookup_fast(nd); + if (IS_ERR_OR_NULL(dentry)) + return dentry; + + if (open_flag & O_CREAT) { + /* Discard negative dentries. Need inode_lock to do the create */ + if (!dentry->d_inode) { + if (!(nd->flags & LOOKUP_RCU)) + dput(dentry); + dentry = NULL; + } + } + return dentry; +} + static const char *open_last_lookups(struct nameidata *nd, struct file *file, const struct open_flags *op) { @@ -3535,28 +3578,39 @@ static const char *open_last_lookups(struct nameidata *nd, return handle_dots(nd, nd->last_type); } + /* We _can_ be in RCU mode here */ + dentry = lookup_fast_for_open(nd, open_flag); + if (IS_ERR(dentry)) + return ERR_CAST(dentry); + if (!(open_flag & O_CREAT)) { - if (nd->last.name[nd->last.len]) - nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY; - /* we _can_ be in RCU mode here */ - dentry = lookup_fast(nd); - if (IS_ERR(dentry)) - return ERR_CAST(dentry); if (likely(dentry)) goto finish_lookup; if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU)) return ERR_PTR(-ECHILD); } else { - /* create side of things */ if (nd->flags & LOOKUP_RCU) { - if (!try_to_unlazy(nd)) + bool unlazied; + + /* can stay in rcuwalk if not auditing */ + if (dentry && audit_dummy_context()) { + if (trailing_slashes(nd)) + return ERR_PTR(-EISDIR); + goto finish_lookup; + } + unlazied = dentry ? try_to_unlazy_next(nd, dentry) : + try_to_unlazy(nd); + if (!unlazied) return ERR_PTR(-ECHILD); } audit_inode(nd->name, dir, AUDIT_INODE_PARENT); - /* trailing slashes? */ - if (unlikely(nd->last.name[nd->last.len])) + if (trailing_slashes(nd)) { + dput(dentry); return ERR_PTR(-EISDIR); + } + if (dentry) + goto finish_lookup; } if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) { --- base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd change-id: 20240723-openfast-ac49a7b6ade2 Best regards, -- Jeff Layton <jlayton@xxxxxxxxxx>