Re: [PATCH] drm/i915: fix SFC reset flow

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 17 Sep 2019 20:49:19 +0100



Quoting Daniele Ceraolo Spurio (2019-09-17 20:45:02)
> 
> 
> On 9/17/2019 11:57 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-17 19:36:35)
> >>
> >> On 9/17/2019 12:57 AM, Chris Wilson wrote:
> >>> Quoting Daniele Ceraolo Spurio (2019-09-16 22:41:04)
> >>>> Our assumption that the we can ask the HW to lock the SFC even if not
> >>>> currently in use does not match the HW commitment. The expectation from
> >>>> the HW is that SW will not try to lock the SFC if the engine is not
> >>>> using it and if we do that the behavior is undefined; on ICL the HW
> >>>> ends up to returning the ack and ignoring our lock request, but this is
> >>>> not guaranteed and we shouldn't expect it going forward.
> >>>>
> >>>> Reported-by: Owen Zhang <owen.zhang@xxxxxxxxx>
> >>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> >>>> ---
> >>>> @@ -366,10 +368,13 @@ static u32 gen11_lock_sfc(struct intel_engine_cs *engine)
> >>>>                                            sfc_forced_lock_ack_bit,
> >>>>                                            sfc_forced_lock_ack_bit,
> >>>>                                            1000, 0, NULL)) {
> >>>> -               DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
> >>>> +               /* did we race the unlock? */
> >>>> +               if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
> >>>> +                       DRM_ERROR("Wait for SFC forced lock ack failed\n");
> >>> What's our plan if this *ERROR* is ever triggered?
> >>>
> >>> If it remains do nothing and check the logs on death, then it remains
> >>> just a debug splat. If there is a plan to actually do something to
> >>> handle the error, do it!
> >>> -Chris
> >> AFAIU the only thing we can do is escalate to full gpu reset. However,
> >> the probability of this failing should be next to non-existent (only one
> >> engine can use the SFC at any time so there is no lock contention), so
> >> I'm not convinced the fallback is worth the effort. The error is still
> >> useful IMO to catch unexpected behavior on new platforms, as it happened
> >> in this case with the media team reporting seeing this message on gen12
> >> with the previous behavior. This said, I'm happy to add the extra logic
> >> if you believe it is worth it.
> > We've see this message on every icl run!
> > -Chris
> 
> I've never noticed it, which tests are hitting it? My understanding from 
> what the HW team said is that on ICL the ack will always come back (even 
> if it is not part of the "official" SW/HW interface) and the HW tweak 
> that stops that is a gen12 change. Something else might be wrong is this 
> is firing off in our ICL CI, also because I don't think we have any test 
> case that actually uses the SFC, do we?

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6911/fi-icl-u2/igt@i915_selftest@live_hangcheck.html

All icl, live_hangcheck or live_reset, for as long as I can remember.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx