On Tue, Jul 16, 2024 at 08:14:31PM +0800, chenridong wrote: > > > On 2024/7/12 9:15, chenridong wrote: > > > > > > On 2024/7/12 1:36, Tejun Heo wrote: > > > Hello, > > > > > > On Thu, Jul 11, 2024 at 03:52:34AM +0000, Roman Gushchin wrote: > > > > > The max_active of system_wq is WQ_DFL_ACTIVE(256). If all > > > > > active works are > > > > > cgroup bpf release works, it will block smp_call_on_cpu work > > > > > which enque > > > > > after cgroup bpf releases. So smp_call_on_cpu holding > > > > > cpu_hotplug_lock will > > > > > wait for completion, but it can never get a completion > > > > > because cgroup bpf > > > > > release works can not get cgroup_mutex and will never finish. > > > > > However, Placing the cgroup bpf release works on cgroup > > > > > destroy will never > > > > > block smp_call_on_cpu work, which means loop is broken. > > > > > Thus, it can solve > > > > > the problem. > > > > > > > > Tejun, > > > > > > > > do you have an opinion on this? > > > > > > > > If there are certain limitations from the cgroup side on what > > > > can be done > > > > in a generic work context, it would be nice to document (e.g. don't grab > > > > cgroup mutex), but I still struggle to understand what exactly is wrong > > > > with the blamed commit. > > > > > > I think the general rule here is more "don't saturate system wqs" rather > > > than "don't grab cgroup_mutex from system_wq". system wqs are for misc > > > things which shouldn't create a large number of concurrent work items. If > > > something is going to generate 256+ concurrent work items, it should > > > use its > > > own workqueue. We don't know what's in system wqs and can't expect > > > its users > > > to police specific lock usages. > > > > > Thank you, Tj. That's exactly what I'm trying to convey. Just like > > cgroup, which has its own workqueue and may create a large number of > > release works, it is better to place all its related works on its > > workqueue rather than on system wqs. > > > > Regards, > > Ridong > > > > > Another aspect is that the current WQ_DFL_ACTIVE is an arbitrary number I > > > came up with close to 15 years ago. Machine size has increased by > > > multiple > > > times, if not an order of magnitude since then. So, "there can't be a > > > reasonable situation where 256 concurrency limit isn't enough" is most > > > likely not true anymore and the limits need to be pushed upward. > > > > > > Thanks. > > > > > > Hello, Tejun, and Roman, is the patch acceptable? Do I need to take any > further actions? > I'm not against merging it. I still find the explanation/commit message a bit vague and believe that maybe some changes need to be done on the watchdog side to make such lockups impossible. As I understand the two most important pieces are the watchdog which tries to run a system work on every cpu while holding cpu_hotplug_lock on read and the cpuset controller which tries to grab cpu_hotplug_lock on writing. It's indeed a tricky problem, so maybe there is no simple and clear explanation. Anyway thank you for finding the problem and providing a reproducer! Thanks!