Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release

Roman Gushchin <roman.gushchin@xxxxxxxxx> · Sun, 9 Jun 2024 19:47:55 -0700

Hi Chen!

Was this problem found in the real life? Do you have a LOCKDEP splash available?

> On Jun 7, 2024, at 4:09 AM, Chen Ridong <chenridong@xxxxxxxxxx> wrote:
> 
> We found an AA deadlock problem as shown belowed:
> 
> cgroup_destroy_wq        TaskB                WatchDog            system_wq
> 
> ...
> css_killed_work_fn:
> P(cgroup_mutex)
> ...
>                                ...
>                                __lockup_detector_reconfigure:
>                                P(cpu_hotplug_lock.read)
>                                ...
>                ...
>                percpu_down_write:
>                P(cpu_hotplug_lock.write)
>                                                ...
>                                                cgroup_bpf_release:
>                                                P(cgroup_mutex)
>                                smp_call_on_cpu:
>                                Wait system_wq
> 
> cpuset_css_offline:
> P(cpu_hotplug_lock.read)
> 
> WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to
> finish the jobs, but the owner of the cgroup_mutex is waiting for
> cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf:
> decouple the lifetime of cgroup_bpf from cgroup itself")
> puts cgroup_bpf release work into system_wq. As cgroup_bpf is a member of
> cgroup, it is reasonable to put cgroup bpf release work into
> cgroup_destroy_wq, which is only used for cgroup's release work, and the
> preblem is solved.

I need to think more on this, but at first glance the fix looks a bit confusing. cgroup_bpf_release() looks quite innocent, it only takes a cgroup_mutex. It’s not obvious why it’s not ok and requires a dedicated work queue. What exactly is achieved by placing it back on the dedicated cgroup destroy queue?

I’m not trying to say your fix won’t work, but it looks like it might cover a more serious problem.