On 2024/7/12 9:15, chenridong wrote:
On 2024/7/12 1:36, Tejun Heo wrote:
Hello,
On Thu, Jul 11, 2024 at 03:52:34AM +0000, Roman Gushchin wrote:
The max_active of system_wq is WQ_DFL_ACTIVE(256). If all active
works are
cgroup bpf release works, it will block smp_call_on_cpu work which
enque
after cgroup bpf releases. So smp_call_on_cpu holding
cpu_hotplug_lock will
wait for completion, but it can never get a completion because
cgroup bpf
release works can not get cgroup_mutex and will never finish.
However, Placing the cgroup bpf release works on cgroup destroy will
never
block smp_call_on_cpu work, which means loop is broken. Thus, it can
solve
the problem.
Tejun,
do you have an opinion on this?
If there are certain limitations from the cgroup side on what can be
done
in a generic work context, it would be nice to document (e.g. don't grab
cgroup mutex), but I still struggle to understand what exactly is wrong
with the blamed commit.
I think the general rule here is more "don't saturate system wqs" rather
than "don't grab cgroup_mutex from system_wq". system wqs are for misc
things which shouldn't create a large number of concurrent work items. If
something is going to generate 256+ concurrent work items, it should
use its
own workqueue. We don't know what's in system wqs and can't expect its
users
to police specific lock usages.
Thank you, Tj. That's exactly what I'm trying to convey. Just like
cgroup, which has its own workqueue and may create a large number of
release works, it is better to place all its related works on its
workqueue rather than on system wqs.
Regards,
Ridong
Another aspect is that the current WQ_DFL_ACTIVE is an arbitrary number I
came up with close to 15 years ago. Machine size has increased by
multiple
times, if not an order of magnitude since then. So, "there can't be a
reasonable situation where 256 concurrency limit isn't enough" is most
likely not true anymore and the limits need to be pushed upward.
Thanks.
Hello, Tejun, and Roman, is the patch acceptable? Do I need to take any
further actions?