On 2024/9/28 16:11, Tetsuo Handa wrote:
On 2024/09/11 20:15, Hillf Danton wrote:
On Mon, 9 Sep 2024 16:19:38 +0200 Michal Koutny <mkoutny@xxxxxxxx>
On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@xxxxxxxxxx> wrote:
The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
acquired in different tasks, which may lead to deadlock.
It can lead to a deadlock through the following steps:
1. A large number of cpusets are deleted asynchronously, which puts a
large number of cgroup_bpf_release works into system_wq. The max_active
of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
cgroup_bpf_release works, and many cgroup_bpf_release works will be put
into inactive queue. As illustrated in the diagram, there are 256 (in
the acvtive queue) + n (in the inactive queue) works.
Given no workqueue work executed without being dequeued, any queued work,
regardless if they are more than 2048, that acquires cgroup_mutex could not
prevent the work queued by thread-T from being executed, so thread-T can
make safe forward progress, therefore with no chance left for the ABBA
deadlock you spotted where lockdep fails to work.
I made a simple test which queues many work items into system_wq and
measures time needed for flushing last work item.
As number of work items increased, time needed also increased.
Although nobody uses flush_workqueue() on system_wq, several users
use flush_work() on work item in system_wq. Therefore, I think that
queuing thousands of work items in system_wq should be avoided,
regardless of whether there is possibility of deadlock.
I have sent a patch to document this.
Link:
https://lore.kernel.org/linux-kernel/20240923114352.4001560-3-chenridong@xxxxxxxxxxxxxxx/
Michal and I are discussing how to make this constraint clear. If you
can express this constraint more clearly, just reply.
Best regards,
Ridong