Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chen!

Was this problem found in the real life? Do you have a LOCKDEP splash available?

> On Jun 7, 2024, at 4:09 AM, Chen Ridong <chenridong@xxxxxxxxxx> wrote:
> 
> We found an AA deadlock problem as shown belowed:
> 
> cgroup_destroy_wq        TaskB                WatchDog            system_wq
> 
> ...
> css_killed_work_fn:
> P(cgroup_mutex)
> ...
>                                ...
>                                __lockup_detector_reconfigure:
>                                P(cpu_hotplug_lock.read)
>                                ...
>                ...
>                percpu_down_write:
>                P(cpu_hotplug_lock.write)
>                                                ...
>                                                cgroup_bpf_release:
>                                                P(cgroup_mutex)
>                                smp_call_on_cpu:
>                                Wait system_wq
> 
> cpuset_css_offline:
> P(cpu_hotplug_lock.read)
> 
> WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to
> finish the jobs, but the owner of the cgroup_mutex is waiting for
> cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf:
> decouple the lifetime of cgroup_bpf from cgroup itself")
> puts cgroup_bpf release work into system_wq. As cgroup_bpf is a member of
> cgroup, it is reasonable to put cgroup bpf release work into
> cgroup_destroy_wq, which is only used for cgroup's release work, and the
> preblem is solved.

I need to think more on this, but at first glance the fix looks a bit confusing. cgroup_bpf_release() looks quite innocent, it only takes a cgroup_mutex. It’s not obvious why it’s not ok and requires a dedicated work queue. What exactly is achieved by placing it back on the dedicated cgroup destroy queue?

I’m not trying to say your fix won’t work, but it looks like it might cover a more serious problem.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux