Re: [PATCH] drm/i915/gt: Confirm the context survives execution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 14/10/2020 09:43, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-10-14 09:36:08)

On 13/10/2020 16:35, Chris Wilson wrote:
Repeat our sanitychecks from before execution to after execution. One
expects that if we were to see these, the gpu would already be on fire,
but the timing may be informative.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
   drivers/gpu/drm/i915/gt/intel_lrc.c | 10 +++++++---
   1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 287537089c77..3dbdd5d0cb60 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1216,7 +1216,8 @@ static void intel_engine_context_out(struct intel_engine_cs *engine)
static void
   execlists_check_context(const struct intel_context *ce,
-                     const struct intel_engine_cs *engine)
+                     const struct intel_engine_cs *engine,
+                     const char *when)
   {
       const struct intel_ring *ring = ce->ring;
       u32 *regs = ce->lrc_reg_state;
@@ -1251,7 +1252,7 @@ execlists_check_context(const struct intel_context *ce,
               valid = false;
       }
- WARN_ONCE(!valid, "Invalid lrc state found before submission\n");
+     WARN_ONCE(!valid, "Invalid lrc state found %s submission\n", when);
   }
static void restore_default_state(struct intel_context *ce,
@@ -1347,7 +1348,7 @@ __execlists_schedule_in(struct i915_request *rq)
               reset_active(rq, engine);
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-             execlists_check_context(ce, engine);
+             execlists_check_context(ce, engine, "before");
if (ce->tag) {
               /* Use a fixed tag for OA and friends */
@@ -1418,6 +1419,9 @@ __execlists_schedule_out(struct i915_request *rq,
        * refrain from doing non-trivial work here.
        */
+ if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+             execlists_check_context(ce, engine, "after");
+

CI failures here are either something super scary or a simple mistake
which I cannot see. Or is engine retire, possible queued up before,
racing with current schedule_out?

It's the unpark while the process_csb is not yet flushed, so we scrub
the kernel_context before it is scheduled-out. It could in theory be a
real problem with our scrubbing to simulate an issue causing an issue,
but the timing is quite slim.

Unpark with unflushed process_csb? I thought maybe you meant park, but poisoning is indeed in unpark. Put pending process_csb means engine is supposed to be unparked already. Or you are saying it went through the parked-unparked cycle all with pending process_csb?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux