The pid_revalidate() function requires dropping from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention as each thread tries to grab a reference to the /proc dentry lock (and drop it shortly thereafter). Allow the pid_revalidate() function to execute under LOOKUP_RCU. When updates must be made to the inode due to the owning task performing setuid(), drop out of RCU and into REF mode. Remove the call to security_task_to_inode(), since we can rely on the call from proc_pid_make_inode(). Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> --- I'd like to use this patch as an RFC on this approach for reducing spinlock contention during many parallel path lookups in the /proc filesystem. The contention can be triggered by, for example, running ~100 parallel instances of "TZ=/etc/localtime ps -fe >/dev/null" on a 100CPU machine. The %sys utilization in such a case reaches around 90%, and profiles show two code paths with high utilization: walk_component lookup_fast unlazy_child legitimize_root __legitimize_path lockref_get_not_dead terminate_walk dput dput By applying this patch, %sys utilization falls to around 60% under the same workload. Although this particular workload is a bit contrived, we have seen some monitoring scripts which produced high %sys time due to this contention. Changes from v2: - Remove get_pid_task_rcu_user() and get_proc_task_rcu(), since they were unnecessary. - Remove the call to security_task_to_inode(). fs/proc/base.c | 47 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index ebea9501afb8..833d55a59e20 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1813,12 +1813,28 @@ int pid_getattr(const struct path *path, struct kstat *stat, /* * Set <pid>/... inode ownership (can change due to setuid(), etc.) */ -void pid_update_inode(struct task_struct *task, struct inode *inode) +static int do_pid_update_inode(struct task_struct *task, struct inode *inode, + unsigned int flags) { - task_dump_owner(task, inode->i_mode, &inode->i_uid, &inode->i_gid); + kuid_t uid; + kgid_t gid; + + task_dump_owner(task, inode->i_mode, &uid, &gid); + if (uid_eq(uid, inode->i_uid) && gid_eq(gid, inode->i_gid) && + !(inode->i_mode & (S_ISUID | S_ISGID))) + return 1; + if (flags & LOOKUP_RCU) + return -ECHILD; + inode->i_uid = uid; + inode->i_gid = gid; inode->i_mode &= ~(S_ISUID | S_ISGID); - security_task_to_inode(task, inode); + return 1; +} + +void pid_update_inode(struct task_struct *task, struct inode *inode) +{ + do_pid_update_inode(task, inode, 0); } /* @@ -1830,19 +1846,22 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int rv = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); - - if (task) { - pid_update_inode(task, inode); - put_task_struct(task); - return 1; + if (flags & LOOKUP_RCU) { + inode = d_inode_rcu(dentry); + task = pid_task(proc_pid(inode), PIDTYPE_PID); + if (task) + rv = do_pid_update_inode(task, inode, flags); + } else { + inode = d_inode(dentry); + task = get_proc_task(inode); + if (task) { + rv = do_pid_update_inode(task, inode, flags); + put_task_struct(task); + } } - return 0; + return rv; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.25.1