We found an AA deadlock problem as shown belowed: cgroup_destroy_wq TaskB WatchDog system_wq ... css_killed_work_fn: P(cgroup_mutex) ... ... __lockup_detector_reconfigure: P(cpu_hotplug_lock.read) ... ... percpu_down_write: P(cpu_hotplug_lock.write) ... cgroup_bpf_release: P(cgroup_mutex) smp_call_on_cpu: Wait system_wq cpuset_css_offline: P(cpu_hotplug_lock.read) WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to finish the jobs, but the owner of the cgroup_mutex is waiting for cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") puts cgroup_bpf release work into system_wq. As cgroup_bpf is a member of cgroup, it is reasonable to put cgroup bpf release work into cgroup_destroy_wq, which is only used for cgroup's release work, and the preblem is solved. Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx> --- kernel/bpf/cgroup.c | 2 +- kernel/cgroup/cgroup-internal.h | 1 + kernel/cgroup/cgroup.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 8ba73042a239..a611a1274788 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -334,7 +334,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt); INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release); - queue_work(system_wq, &cgrp->bpf.release_work); + queue_work(cgroup_destroy_wq, &cgrp->bpf.release_work); } /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 520b90dd97ec..9e57f3e9316e 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -13,6 +13,7 @@ extern spinlock_t trace_cgroup_path_lock; extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN]; extern void __init enable_debug_cgroup(void); +extern struct workqueue_struct *cgroup_destroy_wq; /* * cgroup_path() takes a spin lock. It is good practice not to take diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index e32b6972c478..3317e03fe2fb 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -124,7 +124,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem); * destruction work items don't end up filling up max_active of system_wq * which may lead to deadlock. */ -static struct workqueue_struct *cgroup_destroy_wq; +struct workqueue_struct *cgroup_destroy_wq; /* generate an array of cgroup subsystem pointers */ #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys, -- 2.34.1