Hello. On Sat, Jul 27, 2024 at 06:21:55PM GMT, chenridong <chenridong@xxxxxxxxxx> wrote: > Yes, I have offered the scripts in Link(V1). Thanks (and thanks for patience). There is no lockdep complain about a deadlock (i.e. some circular locking dependencies). (I admit the multiple holders of cgroup_mutex reported there confuse me, I guess that's an artifact of this lockdep report and they could be also waiters.) > > Who'd be the holder of cgroup_mutex preventing cgroup_bpf_release from > > progress? (That's not clear to me from your diagram.) > > > This is a cumulative process. The stress testing deletes a large member of > cgroups, and cgroup_bpf_release is asynchronous, competing with cgroup > release works. Those are different situations: - waiting for one holder that's stuck for some reason (that's what we're after), - waiting because the mutex is contended (that's slow but progresses eventually). > You know, cgroup_mutex is used in many places. Finally, the number of > `cgroup_bpf_release` instances in system_wq accumulates up to 256, and > it leads to this issue. Reaching max_active doesn't mean that queue_work() would block or the items were lost. They are only queued onto inactive_works list. (Remark: cgroup_destroy_wq has only max_active=1 but it apparently doesn't stop progress should there be more items queued (when when cgroup_mutex is not guarding losing references.)) --- The change on its own (deferred cgroup bpf progs removal via cgroup_destroy_wq instead of system_wq) is sensible by collecting related objects removal together (at the same time it shouldn't cause problems by sharing one cgroup_destroy_wq). But the reasoning in the commit message doesn't add up to me. There isn't obvious deadlock, I'd say that system is overloaded with repeated calls of __lockup_detector_reconfigure() and it is not in deadlock state -- i.e. when you stop the test, it should eventually recover. Given that, I'd neither put Fixes: 4bfc0bb2c60e there. (One could symetrically argue to move smp_call_on_cpu() away from system_wq instead of cgroup_bpf_release_fn().) Honestly, I'm not sure it's worth the effort if there's no deadlock. It's possible that I'm misunderstanding or I've missed a substantial detail for why this could lead to a deadlock. It'd be best visible in a sequence diagram with tasks/CPUs left-to-right and time top-down (in the original scheme it looks like time goes right-to-left and there's the unclear situation of the initial cgroup_mutex holder). Thanks, Michal
Attachment:
signature.asc
Description: PGP signature