Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock

Michal Koutný <mkoutny@xxxxxxxx> · Thu, 26 Sep 2024 14:53:46 +0200

Hello Hillf.

(sorry for later reply)

On Wed, Sep 11, 2024 at 07:15:42PM GMT, Hillf Danton <hdanton@xxxxxxxx> wrote:
> > However, there is no ordering between (I) and (II) so they can also happen
> > in opposite
> > 
> > 	thread T					system_wq worker
> > 
> > 	down(cpu_hotplug_lock.read)
> > 	smp_call_on_cpu
> > 	  queue_work_on(cpu, system_wq, scss) (I)
> > 	  						lock(cgroup_mutex)  (II)
> > 							...
> > 							unlock(cgroup_mutex)
> > 							scss.func
> > 	  wait_for_completion(scss)
> > 	up(cpu_hotplug_lock.read)
> > 
> > And here the thread T + system_wq worker effectively call
> > cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
> > two threads, it won't be caught by lockdep.)
> > 
> Given no workqueue work executed without being dequeued, any queued work,
> regardless if they are more than 2048, that acquires cgroup_mutex could not
> prevent the work queued by thread-T from being executed, so thread-T can
> make safe forward progress, therefore with no chance left for the ABBA 
> deadlock you spotted where lockdep fails to work.

Is there a forgotten negation and did you intend to write: "any queued
work ... that acquired cgroup_mutex could prevent"?

Or if the negation is correct, why do you mean that processed work item
is _not_ preventing thread T from running (in the case I left quoted
above)?

Thanks,
Michal
Attachment:
signature.asc

Description: PGP signature