On Tue, Jan 12, 2016 at 05:59:46PM +0200, Imre Deak wrote: > On ti, 2016-01-12 at 16:35 +0100, Daniel Vetter wrote: > > On Tue, Jan 12, 2016 at 02:21:51PM +0000, John Harrison wrote: > > > On 12/01/2016 14:04, Daniel Vetter wrote: > > > > On Tue, Jan 12, 2016 at 11:50:34AM +0000, John Harrison wrote: > > > > > On 12/01/2016 11:28, Chris Wilson wrote: > > > > > > On Tue, Jan 12, 2016 at 11:11:20AM +0000, John Harrison > > > > > > wrote: > > > > > > > On 12/01/2016 00:20, Chris Wilson wrote: > > > > > > > > On Mon, Jan 11, 2016 at 06:42:31PM +0000, John.C.Harrison > > > > > > > > @Intel.com wrote: > > > > > > > > > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > > > > > > > > > > > > > > > A later patch in this series re-organises the batch > > > > > > > > > buffer submission > > > > > > > > > code. Part of that is to reduce the scope of a > > > > > > > > > pm_get/put pair. > > > > > > > > > Specifically, they previously wrapped the entire > > > > > > > > > submission path from > > > > > > > > > the very start to the very end, now they only wrap the > > > > > > > > > actual hardware > > > > > > > > > submission part in the back half. > > > > > > > > However, as you haven't fixed the ordering issue that > > > > > > > > requires rpm_get > > > > > > > > before struct_mutex, this is broken. > > > > > > > Why does 'intel_runtime_pm_get' require the struct mutex to > > > > > > > be held? > > > > > > > It has certainly not complained at me about trying to do > > > > > > > stuff > > > > > > > without it. > > > > > > Because it depends upon the struct_mutex and rpm doesn't have > > > > > > sufficient > > > > > > lockdep integration to be able to warn about using rpm from > > > > > > the > > > > > > incorrect contexts. > > > > > Where? What does the 'pm_runtime_get_sync' call turn into? > > > > > There are already > > > > > other places in the driver which call intel_runtime_pm_get() > > > > > immediately > > > > > after grabbing the mutex lock. Also, the description comment > > > > > for _pm_get() > > > > > does not mention anything about mutexes at all. > > > > If you nest rpm_get within dev->struct_mutex that's a bug and > > > > could > > > > deadlock. Where does this happen? And for any such place we need > > > > a new > > > > subtest in pm_rpm. > > > > > > The first two hits when grepping the driver are in > > > 'i915_gem_seqno_info()' > > > and 'i915_interrupt_info()' in i915_debugfs.c. Both say: > > > ret = mutex_lock_interruptible(&dev->struct_mutex); > > > if (ret) > > > return ret; > > > intel_runtime_pm_get(dev_priv); > > > > Yeah that's totally bonkers and will deadlock if the device is > > actually > > suspend. > > It won't deadlock, runtime resume doesn't need struct mutex. Runtime > suspend needs it, but we return -EAGAIN from there if it's already > held. That in turn will delay the suspend. You're right, why do I always mix this up. But this just got me thinking about possible lockdep annotations. We could place a might_lock(rpm_lockdep_key) within rpm_get. And fake-acquire that around the actual ->resume callback. We could even do this in the rpm core ... That would put all my worries on this topic at easy I think, and I could finally stop confusing everyone ;-) Or did I again miss the obvious? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx