On Thu, 5 Oct 2017, Daniel Vetter wrote: > On Thu, Oct 05, 2017 at 06:19:30PM +0200, Thomas Gleixner wrote: > > On Thu, 5 Oct 2017, Daniel Vetter wrote: > > > On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote: > > > > Aside of that, is it really required to use stomp_machine() for this > > > > synchronization? We certainly have less intrusive mechansisms than that. > > > > > > Yeah, the stop_machine needs to go, I'm working on something that uses > > > rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have > > > merged even. > > > > > > Now this one isn't the one I wanted to fix with this patch since there's > > > clearly something dubious going on on the i915 side too. > > > > I already wondered :) > > > > > The proper trace, with the same part on the cpu hotplug side, highlights > > > that you can't allocate a workqueue while hodling mmap_sem. That one > > > matches patch description&diff a bit better :-) > > > > > Sorry for misleading you, should have checked to attach the right one. No > > > stop_machine()/i915_gem_set_wedged() in the below one. > > > > Well the problem is more or less the same and what I said about solving it > > in a different place is still valid. I think about it some more, but don't > > expect wonders :) > > Yeah just want to make you aware there's now new implications in the > locking maze and that we overall decide to break the loop in the right > place. Also adding Tejun, since this is about workqueues, I forgot him. > > tldr for Tejun: The new cross-release stuff in lockdep seems to indicate > that we cannot allocate a new workqueue while holding mmap_sem. Full > details in the thread. The issue is not restricted to work queues and mmap_sem. There is the general problem of: cpuhotplug -> cpu_up/down -> callback -> device_create/destroy() which creates a dependency between cpuhotplug_rwsem and devfs locking So now any chain which either holds a devfs lock or has a separate dependecy chain on the devfs locks and then calls some function which tries to take cpuhotplug_rwsem will trigger a splat. Rightfully so .... So in your case mmap_sem is involved in that, but that's not a prerequisite. There are a gazillion other ways to achieve that. The pattern which causes that is device creation in a hotplug callback and then some other device access (read/write/ioctl) which ends up to acquire cpuhotplug_rwsem plus a connection of both chains through arbitrary locks. I'm trying to move out that decice_create/remove stuff from the regular hotplug states, but I haven't found a solution for that yet which is neither butt ugly nor creates other hard to solve problems. Maybe a glas of wine or some sleep or both will help to get over that :) Surely anyone is welcome to beat me to it. Thanks, tglx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx