On Fri, Jan 15, 2021 at 6:54 AM Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote: > > Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> writes: > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > >>> The nature of this bug is also described here (with different consequences): > >>> https://lore.kernel.org/lkml/20200211141554.24181-1-qais.yousef@xxxxxxx/ > >> > >> Yeah, pesky deadlocks.. someone was going to try again. > > > > I dug up the synchronous patch > > > > https://lore.kernel.org/lkml/1579878449-10164-1-git-send-email-prsood@xxxxxxxxxxxxxx/ > > > > but surprisingly wasn't able to reproduce the lockdep splat from > > > > https://lore.kernel.org/lkml/F0388D99-84D7-453B-9B6B-EEFF0E7BE4CC@xxxxxx/ > > > > even though I could hit it a few weeks ago. > > oh okay, you need to mount a legacy cpuset hierarchy. > > So as the above splat shows, making cpuset_hotplug_workfn() synchronous > means cpu_hotplug_lock (and "cpuhp_state-down") can be acquired before > cgroup_mutex. > > But there are at least four cgroup paths that take the locks in the > opposite order. They're all the same, they take cgroup_mutex and then > cpu_hotplug_lock later on to modify one or more static keys. > > cpu_hotplug_lock should probably be ahead of cgroup_mutex because the > latter is taken in a hotplug callback, and we should keep the static > branches in cgroup, so the only way out I can think of is moving > cpu_hotplug_lock to just before cgroup_mutex is taken and switching to > _cpuslocked flavors of the static key calls. > > lockdep quiets down with that change everywhere, but it puts another big > lock around a lot of cgroup paths. Seems less heavyhanded to go with > this RFC. What do you all think? Daniel, thank you for taking a look. I don't mind reviewing+testing another approach that you described. > Absent further discussion, Alexey, do you plan to post another version? I plan to update this patch and re-send in the next couple of days. It looks like it might be a series of two patches. Sorry for delays. Best regards, Alexey