Re: [PATCH] drm/i915: run intel_uncore_early_sanitize earlier on resume on non-VLV

Daniel Vetter <daniel@xxxxxxxx> · Thu, 23 Oct 2014 14:16:04 +0200

On Wed, Oct 22, 2014 at 05:01:54PM -0200, Paulo Zanoni wrote:
> 2014-10-22 9:20 GMT-02:00 Imre Deak <imre.deak@xxxxxxxxx>:
> > On Tue, 2014-10-21 at 19:05 +0200, Daniel Vetter wrote:
> >> On Mon, Oct 20, 2014 at 01:20:50PM +0300, Imre Deak wrote:
> >> > On Fri, 2014-10-17 at 16:01 -0300, Paulo Zanoni wrote:
> >> > > From: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> >> > >
> >> > > As far as I understand, intel_uncore_early_sanitize() was supposed to
> >> > > be ran before any register access, but currently
> >> > > intel_resume_prepare() is ran earlier, and it does register
> >> > > access. I don't think it should be safe to be calling
> >> > > I915_{READ,WRITE} without calling intel_uncore_early_sanitize() first.
> >> > >
> >> > > One of the problems we currently have is that when we suspend/resume
> >> > > BDW, the FPGA_DBG_RM_NOCLAIM bit becomes 1, so we end up printing an
> >> > > "unclaimed register" message on resume, but this message doesn't
> >> > > really seem to have been triggered by our driver or user space, since
> >> > > the bit was not there before suspending, and gets there just after
> >> > > resuming, before any of our own register accesses. So calling
> >> > > intel_uncore_early_sanitize() as a first thing will allow us to stop
> >> > > printing the error message, fixing the "bug".
> >> > >
> >> > > v2: VLV is an exception to the early_sanitize() rule: it needs to do
> >> > > stuff before calling early_sanitize(), so instead of calling it
> >> > > earlier for every platform, we call it earlier for non-VLV by adding
> >> > > the early_sanitize() call inside intel_resume_prepare(). This doesn't
> >> > > look like the most-beautiful-solution-ever, but, well, at least it
> >> > > fixes the bug. (Imre)
> >> > >
> >> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >> > > Cc: Imre Deak <imre.deak@xxxxxxxxx>
> >> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83094
> >> > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> >> > > ---
> >> > >  drivers/gpu/drm/i915/i915_drv.c | 9 ++++++++-
> >> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> > >
> >> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> >> > > index a05a1d0..f6d28f2 100644
> >> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> >> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> >> > > @@ -669,7 +669,6 @@ static int i915_drm_thaw_early(struct drm_device *dev)
> >> > >   if (ret)
> >> > >           DRM_ERROR("Resume prepare failed: %d,Continuing resume\n", ret);
> >> > >
> >> > > - intel_uncore_early_sanitize(dev, true);
> >> > >   intel_uncore_sanitize(dev);
> >> > >   intel_power_domains_init_hw(dev_priv);
> >> > >
> >> > > @@ -1049,6 +1048,8 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
> >> > >
> >> > >   if (rpm_resume)
> >> > >           intel_init_pch_refclk(dev);
> >> > > + else
> >> > > +         intel_uncore_early_sanitize(dev, true);
> >> > >
> >> > >   return 0;
> >> > >  }
> >> > > @@ -1056,6 +1057,9 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
> >> > >  static int hsw_resume_prepare(struct drm_i915_private *dev_priv,
> >> > >                           bool rpm_resume)
> >> > >  {
> >> > > + if (!rpm_resume)
> >> > > +         intel_uncore_early_sanitize(dev_priv->dev, true);
> >> > > +
> >> > >   hsw_disable_pc8(dev_priv);
> >> > >
> >> > >   return 0;
> >> > > @@ -1421,6 +1425,9 @@ static int vlv_resume_prepare(struct drm_i915_private *dev_priv,
> >> > >           i915_gem_restore_fences(dev);
> >> > >   }
> >> > >
> >> > > + if (!rpm_resume)
> >> > > +         intel_uncore_early_sanitize(dev, true);
> >> > > +
> >> > >   return ret;
> >> > >  }
> >> > >
> >> >
> >> > You also need to call intel_uncore_early_sanitize() from
> >> > intel_resume_prepare() for the rest of the platforms. With that fixed:
> >> > Reviewed-by: Imre Deak <imre.deak@xxxxxxxxx>
> >> >
> >> > Looking at the result, I agree it's not the nicest, so yet another way
> >> > to reduce the clutter would be to have the following instead in
> >> > i915_drm_thaw_early():
> >> >
> >> > intel_resume_early_prepare()
> >> > intel_uncore_early_sanitize()
> >> > intel_resume_prepare()
> >> >
> >> > and do the early steps for VLV in intel_resume_early_prepare(). I'm ok
> >> > with both solutions.
> >>
> >> This honestly starts to smell like a giant maintenance nightmare. We kinda
> >> started off into the wrong direction with vlv rpm and it seems to get
> >> worse by the day. And it looks like the situation is messy enough that we
> >> can't even look down the ordering with copious amounts of warnings ...
> >>
> >> But I also don't see any real solution, so just ranting for now. I'd
> >> appreciate though if the revised version comes with a bunch of comments
> >> attached in the code.
> >
> > I blame it on the HW people. :) Seriously, the VLV PM code differs from
> > the rest of PM code in that we save/restore some HW state instead of
> > reinitializing it. That's where the above special casing of the ordering
> > stems from. I agree that it's not ideal, but I think having started with
> > that solution and moving towards the ideal was not that bad. In fact
> > s0ix doesn't yet work in the upstream kernel for reasons independent of
> > i915 (or at least I couldn't make it work), but we would need it to
> > fully validate all the suspend/resume paths.
> 
> On a side note, even igt/pm_rpm/rte (the basic subtest) seems to be
> broken on BYT since forever (at least according to QA, bug #82939), so
> do we even want RPM enabled on BYT?

If it's really broken and not just the test being pedantic about
something, then yes please submit the revert. Submitting reverts is the
best way to make sure PM and engineers are aware that something didn't go
as it should have, and it's the quickest way to rack up good bug team
stats. So please bring them on.

This is btw true for regressions in general: If you don't see action on a
bug, then just send in the revert. Either the original author can come up
with fix within a few days, or I'll just merge the revert. In either case,
bug stats improve.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx