On Thu, 25 Aug 2016 17:32:41 +0200, Chris Wilson wrote: > > On Thu, Aug 25, 2016 at 03:11:35PM +0200, Takashi Iwai wrote: > > Hi, > > > > while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4 > > resume is broken on many machines with i915 gfx, even on the upstream > > 4.7 kernel. Originally it was reported by Intel about SKL machines, > > but later on, we found that it hits on many other older chips (at > > least HSW), too. > > > > This was hard to identify because there have been other S4 resume bugs > > until recently. But even after these fixes, when the system is tested > > on i915 gfx, the system gets memory corruption or kernel Oops sooner > > or later after a few (usually < 10) S4 cycles. > > > > As the bug happened between 4.2 and 4.3, I bisected and it pointed to > > the commit: > > > > 4c436d55b279bbc6b02aac02e7dc683fc09f884e > > drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT > > > > Indeed, reverting this on top of our 4.4.x kernel seems to make S4 > > working stably (at least on a test machine). > > > > Does this make any sense to you guys? > > It could explain a HSW issue, but not SKL and not BDW by default (using > execlists). Hrm, IIRC, SKL didn't work without the commit. But it doesn't matter if this doesn't has a bad influence on SKL, either. > All machine big core, no Braswell or other !llc device? Yeah, tested only on SKL, HSW and IVY. No small boxes. > > Since the commit message doesn't give a good explanation about this > > change, I wonder what's the purpose of this commit. Was it merely > > optimization? > > It's a step towards enabling the Resource Streamer hardware block in the > GPU, kind of a glorified prefetch. Userspace also has to notify the GPU > to enable it for a batch as well. > > The only doubt in mind my about that patch was whether the hw ignored > the RS_RESTORE_STATE if MI_RESTORE_INHIBIT as expected: > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 3c97f0e7a003..c618bb86aeb9 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags) > > /* These flags are for resource streamer on HSW+ */ > if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8) > - flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN); > + flags |= HSW_MI_RS_SAVE_STATE_EN; > else if (INTEL_GEN(dev_priv) < 8) > - flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN); > - > + flags |= MI_SAVE_EXT_STATE_EN; > + if ((flags & MI_RESTORE_INHIBIT) == 0) { > + if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8) > + flags |= HSW_MI_RS_RESTORE_STATE_EN; > + else if (INTEL_GEN(dev_priv) < 8) > + flags |= MI_RESTORE_EXT_STATE_EN; > + } > > > But this code is only executed for Haswell, and we only inhibit restore > from uninitialised contexts. The biggest fear is that we are restoring > garbage from userspace and making the GPU do strange things. > > That still doesn't explain stray writes into system memory, that is more > likely our GTT being broken. Yeah, that's my expectation. But the bisection didn't reach it. So maybe this casually surfaced the hidden GTT or cache issue. > Could you confirm that bisect has any > impact on the other machines, and of course double check the result? You're asking bisection on all machines from the scratch for such a bug taking so long time to reproduce, and especially for i915 code path, that is known to be deadly difficult due to various merge commits? I sincerely decline the offer :) Yes, the result was double-checked. This has a positive effect on all our tested machines. Maybe But it's hard to tell exactly whether this is the 100% culprit. As said, there have been multiple S4 bugs, so far. IVY worked without this patch (after x86 fixes), but obviously this had no negative effect, either. thanks, Takashi _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx