Re: [PATCH v4 02/38] drm/i915: Explicit power enable during deferred context initialisation

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 12 Jan 2016 16:59:56 +0000



On Tue, Jan 12, 2016 at 05:11:00PM +0100, Daniel Vetter wrote:
> On Tue, Jan 12, 2016 at 05:59:46PM +0200, Imre Deak wrote:
> > On ti, 2016-01-12 at 16:35 +0100, Daniel Vetter wrote:
> > > On Tue, Jan 12, 2016 at 02:21:51PM +0000, John Harrison wrote:
> > > > On 12/01/2016 14:04, Daniel Vetter wrote:
> > > > > On Tue, Jan 12, 2016 at 11:50:34AM +0000, John Harrison wrote:
> > > > > > On 12/01/2016 11:28, Chris Wilson wrote:
> > > > > > > On Tue, Jan 12, 2016 at 11:11:20AM +0000, John Harrison
> > > > > > > wrote:
> > > > > > > > On 12/01/2016 00:20, Chris Wilson wrote:
> > > > > > > > > On Mon, Jan 11, 2016 at 06:42:31PM +0000, John.C.Harrison
> > > > > > > > > @Intel.com wrote:
> > > > > > > > > > From: John Harrison <John.C.Harrison@xxxxxxxxx>
> > > > > > > > > > 
> > > > > > > > > > A later patch in this series re-organises the batch
> > > > > > > > > > buffer submission
> > > > > > > > > > code. Part of that is to reduce the scope of a
> > > > > > > > > > pm_get/put pair.
> > > > > > > > > > Specifically, they previously wrapped the entire
> > > > > > > > > > submission path from
> > > > > > > > > > the very start to the very end, now they only wrap the
> > > > > > > > > > actual hardware
> > > > > > > > > > submission part in the back half.
> > > > > > > > > However, as you haven't fixed the ordering issue that
> > > > > > > > > requires rpm_get
> > > > > > > > > before struct_mutex, this is broken.
> > > > > > > > Why does 'intel_runtime_pm_get' require the struct mutex to
> > > > > > > > be held?
> > > > > > > > It has certainly not complained at me about trying to do
> > > > > > > > stuff
> > > > > > > > without it.
> > > > > > > Because it depends upon the struct_mutex and rpm doesn't have
> > > > > > > sufficient
> > > > > > > lockdep integration to be able to warn about using rpm from
> > > > > > > the
> > > > > > > incorrect contexts.
> > > > > > Where? What does the 'pm_runtime_get_sync' call turn into?
> > > > > > There are already
> > > > > > other places in the driver which call intel_runtime_pm_get()
> > > > > > immediately
> > > > > > after grabbing the mutex lock. Also, the description comment
> > > > > > for _pm_get()
> > > > > > does not mention anything about mutexes at all.
> > > > > If you nest rpm_get within dev->struct_mutex that's a bug and
> > > > > could
> > > > > deadlock. Where does this happen? And for any such place we need
> > > > > a new
> > > > > subtest in pm_rpm.
> > > > 
> > > > The first two hits when grepping the driver are in
> > > > 'i915_gem_seqno_info()'
> > > > and 'i915_interrupt_info()' in i915_debugfs.c. Both say:
> > > >     ret = mutex_lock_interruptible(&dev->struct_mutex);
> > > >     if (ret)
> > > >         return ret;
> > > >     intel_runtime_pm_get(dev_priv);
> > > 
> > > Yeah that's totally bonkers and will deadlock if the device is
> > > actually
> > > suspend.
> > 
> > It won't deadlock, runtime resume doesn't need struct mutex. Runtime
> > suspend needs it, but we return -EAGAIN from there if it's already
> > held. That in turn will delay the suspend.
> 
> You're right, why do I always mix this up. But this just got me thinking
> about possible lockdep annotations. We could place a
> might_lock(rpm_lockdep_key) within rpm_get. And fake-acquire that around
> the actual ->resume callback. We could even do this in the rpm core ...

Or we could make progress on eliminating the need for struct_mutex in
rpm resume/suspend/whenevner :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx