On 2024/09/11 20:15, Hillf Danton wrote: > On Mon, 9 Sep 2024 16:19:38 +0200 Michal Koutny <mkoutny@xxxxxxxx> >> On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@xxxxxxxxxx> wrote: >>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are >>> acquired in different tasks, which may lead to deadlock. >>> It can lead to a deadlock through the following steps: >>> 1. A large number of cpusets are deleted asynchronously, which puts a >>> large number of cgroup_bpf_release works into system_wq. The max_active >>> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are >>> cgroup_bpf_release works, and many cgroup_bpf_release works will be put >>> into inactive queue. As illustrated in the diagram, there are 256 (in >>> the acvtive queue) + n (in the inactive queue) works. > Given no workqueue work executed without being dequeued, any queued work, > regardless if they are more than 2048, that acquires cgroup_mutex could not > prevent the work queued by thread-T from being executed, so thread-T can > make safe forward progress, therefore with no chance left for the ABBA > deadlock you spotted where lockdep fails to work. I made a simple test which queues many work items into system_wq and measures time needed for flushing last work item. As number of work items increased, time needed also increased. Although nobody uses flush_workqueue() on system_wq, several users use flush_work() on work item in system_wq. Therefore, I think that queuing thousands of work items in system_wq should be avoided, regardless of whether there is possibility of deadlock. ---------------------------------------- #include <linux/module.h> #include <linux/workqueue.h> static void worker_func(struct work_struct *work) { schedule_timeout_uninterruptible(HZ); } #define MAX_WORKS 8192 static struct work_struct works[MAX_WORKS]; static int __init test_init(void) { int i; unsigned long start, end; for (i = 0; i < MAX_WORKS; i++) { INIT_WORK(&works[i], worker_func); schedule_work(&works[i]); } start = jiffies; flush_work(&works[MAX_WORKS - 1]); end = jiffies; printk("%u: Took %lu jiffies. (HZ=%u)\n", MAX_WORKS, end - start, HZ); for (i = 0; i < MAX_WORKS; i++) flush_work(&works[i]); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------------------------------------- 12 CPUs 256: Took 1025 jiffies. (HZ=1000) 512: Took 2091 jiffies. (HZ=1000) 1024: Took 4105 jiffies. (HZ=1000) 2048: Took 8321 jiffies. (HZ=1000) 4096: Took 16382 jiffies. (HZ=1000) 8192: Took 32770 jiffies. (HZ=1000) 1 CPU 256: Took 1133 jiffies. (HZ=1000) 512: Took 2047 jiffies. (HZ=1000) 1024: Took 4117 jiffies. (HZ=1000) 2048: Took 8210 jiffies. (HZ=1000) 4096: Took 16424 jiffies. (HZ=1000) 8192: Took 32774 jiffies. (HZ=1000)