Re: [PATCH] drm/i915: Cancel outstanding modeset workers before suspend

Daniel Vetter <daniel@xxxxxxxx> · Tue, 6 Aug 2013 16:19:00 +0200

On Tue, Aug 6, 2013 at 3:47 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Aug 06, 2013 at 03:01:09PM +0200, Daniel Vetter wrote:
>> On Tue, Aug 06, 2013 at 01:24:02PM +0100, Chris Wilson wrote:
>> > Upon resume we will do a complete restoration of the mode and so reset
>> > all tasks, but during suspend (and unload) we need to make sure that no
>> > workers run concurrently with our suspend code. Or worse just after.
>> >
>> > The issue was first raised whilst tackling a suspend issue with Takashi
>> > Iwai (http://lists.freedesktop.org/archives/intel-gfx/2012-April/016738.html)
>> > and then it was independently rediscovered by Chuanshen Lui.
>> >
>> > v2: Rebase for the lost year.
>> >
>> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> > Cc: Takashi Iwai <tiwai@xxxxxxx>
>> > Cc: "Liu, Chuansheng" <chuansheng.liu@xxxxxxxxx>
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.h      |  1 +
>> >  drivers/gpu/drm/i915/intel_display.c | 21 ++++++++++++++++-----
>> >  2 files changed, 17 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> > index 64b5d55..cb2c59d 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -2130,6 +2130,7 @@ extern void intel_modeset_init_hw(struct drm_device *dev);
>> >  extern void intel_modeset_suspend_hw(struct drm_device *dev);
>> >  extern void intel_modeset_init(struct drm_device *dev);
>> >  extern void intel_modeset_gem_init(struct drm_device *dev);
>> > +extern void intel_modeset_quiesce(struct drm_device *dev);
>> >  extern void intel_modeset_cleanup(struct drm_device *dev);
>> >  extern int intel_modeset_vga_set_state(struct drm_device *dev, bool state);
>> >  extern void intel_modeset_setup_hw_state(struct drm_device *dev,
>> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> > index 7c7325e..0584480 100644
>> > --- a/drivers/gpu/drm/i915/intel_display.c
>> > +++ b/drivers/gpu/drm/i915/intel_display.c
>> > @@ -9891,6 +9891,7 @@ void intel_modeset_init_hw(struct drm_device *dev)
>> >
>> >  void intel_modeset_suspend_hw(struct drm_device *dev)
>> >  {
>> > +   intel_modeset_quiesce(dev);
>> >     intel_suspend_hw(dev);
>> >  }
>> >
>> > @@ -10324,9 +10325,19 @@ void intel_modeset_gem_init(struct drm_device *dev)
>> >     intel_modeset_setup_hw_state(dev, false);
>> >  }
>> >
>> > -void intel_modeset_cleanup(struct drm_device *dev)
>> > +void intel_modeset_quiesce(struct drm_device *dev)
>> >  {
>> >     struct drm_i915_private *dev_priv = dev->dev_private;
>> > +
>> > +   cancel_work_sync(&dev_priv->hotplug_work);
>> > +   cancel_work_sync(&dev_priv->rps.work);
>> > +
>> > +   /* catch all required for dev_priv->wq */
>> > +   flush_scheduled_work();
>>
>> This only flushes the system workqueue, so the changed comment about
>> dev_priv->wq is misleading. Can you please change it back?
>
> No, because that would duplicate the comment. It's an entirely new
> comment meant to remind us the use of the global wq. But you are
> right, the dev_priv->wq mention is completely wrong. We need to flush
> miscellaneous work items such as flip and unpin tasks. So
> /* catch all required for everything else (e.g. unpin work) */
>
> Since writing this patch, we've gained
>   cancel_delayed_work_sync(&dev_priv->rps.delayed_resume_work);
>   cancel_delayed_work_sync(&dev_priv->rps.vlv_work);
> which need to be taken in consideration.

Thinking about this some more we already cancel the rps work items
(with the exception of the vlv_work) in intel_disable_gt_powersave,
which is called already from i915_save_state. So I think we just need
to add the cancel_work for vlv_work at the right spot.

For the hotplug work item we also have the
dev_priv->enable_hotplug_processing state changes (required to get the
ordering right on the resume side of things). I'd prefer if the
hotplug cancel_work_sync call is right next to this (maybe as part of
a irq disable function?).

That leaves the catchall for unpin work and similar stuff. I think
that suits best as part of i915_gem_idle next to cancelling the
retire_work handler - with unpin am slightly afraid that if it runs
later on while gem is already quiescent it'll create havoc.

So roughly three patches, with the hotplug one for -fixes/stable since
we have real reports about it. What do you think?

Aside: I'm still looking in vain for a more systematic approach to our
cleanup handling, and especially ensuring that we get the ordering all
right ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx