Re: [PATCH v2 7/8] drm/i915: sanitize rps irq disabling

Paulo Zanoni <przanoni@xxxxxxxxx> · Mon, 17 Nov 2014 16:54:56 -0200

2014-11-10 11:44 GMT-02:00 Imre Deak <imre.deak@xxxxxxxxx>:
> When disabling the RPS interrupts there is a tricky dependency between
> the thread disabling the interrupts, the RPS interrupt handler and the
> corresponding RPS work. The RPS work can reenable the interrupts, so
> there is no straightforward order in the disabling thread to (1) make
> sure that any RPS work is flushed and to (2) disable all RPS
> interrupts. Currently this is solved by masking the interrupts using two
> separate mask registers (first level display IMR and PM IMR) and doing
> the disabling when all first level interrupts are disabled.
>
> This works, but the requirement to run with all first level interrupts
> disabled is unnecessary making the suspend / unload time ordering of RPS
> disabling wrt. other unitialization steps difficult and error prone.
> Removing this restriction allows us to disable RPS early during suspend
> / unload and forget about it for the rest of the sequence. By adding a
> more explicit method for avoiding the above race, it also becomes easier
> to prove its correctness. Finally currently we can hit the WARN in
> snb_update_pm_irq(), when a final RPS work runs with the first level
> interrupts already disabled. This won't lead to any problem (due to the
> separate interrupt masks), but with the change in this and the next
> patch we can get rid of the WARN, while leaving it in place for other
> scenarios.
>
> To address the above points, add a new RPS interrupts_enabled flag and
> use this during RPS disabling to avoid requeuing the RPS work and
> reenabling of the RPS interrupts. Since the interrupt disabling happens
> now in intel_suspend_gt_powersave(), we will disable RPS interrupts
> explicitly during suspend (and not just through the first level mask),
> but there is no problem doing so, it's also more consistent and allows
> us to unify more of the RPS disabling during suspend and unload time in
> the next patch.
>

First of all: yes, this is a tricky problem and I like the general
idea of adding interrupts_enabled to help make things a little more
sane.

Is there any test case to exercise the WARN you mentioned above? Is it
possible to write one? Is there any bug reported on that WARN? It
would be nice to see some "Testcase:" and "Bugzilla:" tags in this
patch.

A little more below:

> v2:
> - rebase on v2 of patch "drm/i915: move rps irq disable one level up"
>
> Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  6 +++++-
>  drivers/gpu/drm/i915/i915_irq.c | 23 ++++++++++++++++-------
>  drivers/gpu/drm/i915/intel_pm.c | 17 +++++++++--------
>  3 files changed, 30 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f830596..14e8f82 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -978,8 +978,12 @@ struct intel_rps_ei {
>  };
>
>  struct intel_gen6_power_mgmt {
> -       /* work and pm_iir are protected by dev_priv->irq_lock */
> +       /*
> +        * work, interrupts_enabled and pm_iir are protected by
> +        * dev_priv->irq_lock
> +        */
>         struct work_struct work;
> +       bool interrupts_enabled;
>         u32 pm_iir;
>
>         /* Frequencies are stored in potentially platform dependent multiples.
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 89a7be1..2677760 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -271,6 +271,7 @@ void gen6_enable_rps_interrupts(struct drm_device *dev)
>
>         spin_lock_irq(&dev_priv->irq_lock);
>         WARN_ON(dev_priv->rps.pm_iir);
> +       dev_priv->rps.interrupts_enabled = true;
>         gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>         spin_unlock_irq(&dev_priv->irq_lock);
>  }
> @@ -279,14 +280,16 @@ void gen6_disable_rps_interrupts(struct drm_device *dev)
>  {
>         struct drm_i915_private *dev_priv = dev->dev_private;
>
> +       spin_lock_irq(&dev_priv->irq_lock);
> +       dev_priv->rps.interrupts_enabled = false;
> +       spin_unlock_irq(&dev_priv->irq_lock);
> +
> +       cancel_work_sync(&dev_priv->rps.work);
> +
>         I915_WRITE(GEN6_PMINTRMSK, INTEL_INFO(dev_priv)->gen >= 8 ?
>                    ~GEN8_PMINTR_REDIRECT_TO_NON_DISP : ~0);
>         I915_WRITE(gen6_pm_ier(dev_priv), I915_READ(gen6_pm_ier(dev_priv)) &
>                                 ~dev_priv->pm_rps_events);
> -       /* Complete PM interrupt masking here doesn't race with the rps work
> -        * item again unmasking PM interrupts because that is using a different
> -        * register (PMIMR) to mask PM interrupts. The only risk is in leaving
> -        * stale bits in PMIIR and PMIMR which gen6_enable_rps will clean up. */
>
>         spin_lock_irq(&dev_priv->irq_lock);
>         dev_priv->rps.pm_iir = 0;
> @@ -1133,6 +1136,11 @@ static void gen6_pm_rps_work(struct work_struct *work)
>         int new_delay, adj;
>
>         spin_lock_irq(&dev_priv->irq_lock);
> +       /* Speed up work cancelation during disabling rps interrupts. */
> +       if (!dev_priv->rps.interrupts_enabled) {
> +               spin_unlock_irq(&dev_priv->irq_lock);
> +               return;
> +       }

Don't you also need to "dev_priv->rps.pm_iir = 0" before the return
above? Just in case things were canceled after rps.work was launched,
but before it was ran.

>         pm_iir = dev_priv->rps.pm_iir;
>         dev_priv->rps.pm_iir = 0;
>         /* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
> @@ -1706,11 +1714,12 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
>
>         if (pm_iir & dev_priv->pm_rps_events) {
>                 spin_lock(&dev_priv->irq_lock);
> -               dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
>                 gen6_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
> +               if (dev_priv->rps.interrupts_enabled) {
> +                       dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
> +                       queue_work(dev_priv->wq, &dev_priv->rps.work);
> +               }
>                 spin_unlock(&dev_priv->irq_lock);
> -
> -               queue_work(dev_priv->wq, &dev_priv->rps.work);
>         }
>
>         if (INTEL_INFO(dev_priv)->gen >= 8)
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index f555810..dcdd269 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6179,9 +6179,17 @@ void intel_suspend_gt_powersave(struct drm_device *dev)
>         /* Interrupts should be disabled already to avoid re-arming. */
>         WARN_ON(intel_irqs_enabled(dev_priv));
>
> +       if (INTEL_INFO(dev)->gen < 6)
> +               return;
> +
>         flush_delayed_work(&dev_priv->rps.delayed_resume_work);
>
> -       cancel_work_sync(&dev_priv->rps.work);
> +       /*
> +        * TODO: disable RPS interrupts on GEN9 too once RPS support
> +        * is added for it.
> +        */
> +       if (INTEL_INFO(dev)->gen != 9)
> +               gen6_disable_rps_interrupts(dev);
>
>         /* Force GPU to min freq during suspend */
>         gen6_rps_idle(dev_priv);
> @@ -6210,13 +6218,6 @@ void intel_disable_gt_powersave(struct drm_device *dev)
>                 else
>                         gen6_disable_rps(dev);
>
> -               /*
> -                * TODO: disable RPS interrupts on GEN9 too once RPS support
> -                * is added for it.
> -                */
> -               if (INTEL_INFO(dev)->gen != 9)
> -                       gen6_disable_rps_interrupts(dev);

But then, what about the users that call intel_disable_gt_powersave()
without calling intel_suspend_gt_powersave()? Don't they get problems
because the interrupts are still enabled?

> -
>                 dev_priv->rps.enabled = false;
>                 mutex_unlock(&dev_priv->rps.hw_lock);
>         }
> --
> 1.8.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx