When submitting semaphores in execlist mode the hang checker crashes in this function because it is only runnable in ring submission mode. The reason this is of particular interest to the TDR patch series is because we use semaphores as a mean to induce hangs during testing (which is the recommended way to induce hangs for gen8+). It's not clear how this is supposed to work in execlist mode since: 1. This function requires a ring buffer. 2. Retrieving a ring buffer in execlist mode requires us to retrieve the corresponding context, which we get from a request. 3. Retieving a request from the hang checker is not straight-forward since that requires us to grab the struct_mutex in order to synchronize against the request retirement thread. 4. Grabbing the struct_mutex from the hang checker is nothing that we will do since that puts us at risk of deadlock since a hung thread might be holding the struct_mutex already. Therefore it's not obvious how we're supposed to deal with this. For now, we're doing an early exit from this function, which avoids any kernel panic situation when running our own internal TDR ULT. * v2: (Chris Wilson) Turned the execlist mode check into a ringbuffer NULL check to make it more submission mode agnostic and less of a layering violation. Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_irq.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 46bcbff..4e8e722 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2698,6 +2698,26 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno) u64 offset = 0; int i, backwards; + /* + * This function does not support execlist mode - any attempt to + * proceed further into this function will result in a kernel panic + * when dereferencing ring->buffer, which is not set up in execlist + * mode. + * + * The correct way of doing it would be to derive the currently + * executing ring buffer from the current context, which is derived + * from the currently running request. Unfortunately, to get the + * current request we would have to grab the struct_mutex before doing + * anything else, which would be ill-advised since some other thread + * might have grabbed it already and managed to hang itself, causing + * the hang checker to deadlock. + * + * Therefore, this function does not support execlist mode in its + * current form. Just return NULL and move on. + */ + if (ring->buffer == NULL) + return NULL; + ipehr = I915_READ(RING_IPEHR(ring->mmio_base)); if (!ipehr_is_semaphore_wait(ring->dev, ipehr)) return NULL; -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx