On 6/18/20 5:02 PM, ebiederm@xxxxxxxxxxxx wrote:
Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:
On Thu, Jun 18, 2020 at 03:17:33PM -0700, Junxiao Bi wrote:
When debugging some performance issue, i found that thousands of threads
exit around same time could cause a severe spin lock contention on proc
dentry "/proc/$parent_process_pid/task/", that's because threads needs to
clean up their pid file from that dir when exit. Check the following
standalone test case that simulated the case and perf top result on v5.7
kernel. Any idea on how to fix this?
Thanks, Junxiao.
We've looked at a few different ways of fixing this problem.
Even though the contention is within the dcache, it seems like a usecase
that the dcache shouldn't be optimised for -- generally we do not have
hundreds of CPUs removing dentries from a single directory in parallel.
We could fix this within procfs. We don't have a great patch yet, but
the current approach we're looking at allows only one thread at a time
to call dput() on any /proc/*/task directory.
We could also look at fixing this within the scheduler. Only allowing
one CPU to run the threads of an exiting process would fix this particular
problem, but might have other consequences.
I was hoping that 7bc3e6e55acf would fix this, but that patch is in 5.7,
so that hope is ruled out.
Does anyone know if problem new in v5.7? I am wondering if I introduced
this problem when I refactored the code or if I simply churned the code
but the issue remains effectively the same.
It's not new issue, we see it in old kernel like v4.14
Can you try only flushing entries when the last thread of the process is
reaped? I think in practice we would want to be a little more
sophisticated but it is a good test case to see if it solves the issue.
Thank you. i will try and let you know.
Thanks,
Junxiao.
diff --git a/kernel/exit.c b/kernel/exit.c
index cebae77a9664..d56e4eb60bdd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -152,7 +152,7 @@ void put_task_struct_rcu_user(struct task_struct *task)
void release_task(struct task_struct *p)
{
struct task_struct *leader;
- struct pid *thread_pid;
+ struct pid *thread_pid = NULL;
int zap_leader;
repeat:
/* don't need to get the RCU readlock here - the process is dead and
@@ -165,7 +165,8 @@ void release_task(struct task_struct *p)
write_lock_irq(&tasklist_lock);
ptrace_release_task(p);
- thread_pid = get_pid(p->thread_pid);
+ if (p == p->group_leader)
+ thread_pid = get_pid(p->thread_pid);
__exit_signal(p);
/*
@@ -188,8 +189,10 @@ void release_task(struct task_struct *p)
}
write_unlock_irq(&tasklist_lock);
- proc_flush_pid(thread_pid);
- put_pid(thread_pid);
+ if (thread_pid) {
+ proc_flush_pid(thread_pid);
+ put_pid(thread_pid);
+ }
release_thread(p);
put_task_struct_rcu_user(p);