Hello. On Thu, Jul 25, 2024 at 09:48:36AM GMT, chenridong <chenridong@xxxxxxxxxx> wrote: > > > This issue can be reproduced by the following methods: > > > 1. A large number of cpuset cgroups are deleted. > > > 2. Set cpu on and off repeatly. > > > 3. Set watchdog_thresh repeatly. BTW I assume this is some stress testing, not a regular use scenario of yours, right? > > > > > > The reason for this issue is cgroup_mutex and cpu_hotplug_lock are > > > acquired in different tasks, which may lead to deadlock. > > > It can lead to a deadlock through the following steps: > > > 1. A large number of cgroups are deleted, which will put a large > > > number of cgroup_bpf_release works into system_wq. The max_active > > > of system_wq is WQ_DFL_ACTIVE(256). When cgroup_bpf_release can not > > > get cgroup_metux, it may cram system_wq, and it will block work > > > enqueued later. Who'd be the holder of cgroup_mutex preventing cgroup_bpf_release from progress? (That's not clear to me from your diagram.) ... > > Given idle worker created independent of WQ_DFL_ACTIVE before handling > > work item, no deadlock could rise in your scenario above. > > Hello Hillf, did you mean to say this issue couldn't happen? Ridong, can you reproduce this with CONFIG_PROVE_LOCKING (or do you have lockdep message from it aready)? It'd be helpful to get insight into the suspected dependencies. Thanks, Michal
Attachment:
signature.asc
Description: PGP signature