Re: [PATCH 2/3] drm/i915: Complete requests in nop_submit_request

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Fri, 18 Nov 2016 15:03:21 +0200

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> writes:

> On 18/11/2016 09:37, Chris Wilson wrote:
>> Since the submit/execute split in commit d55ac5bf97c6 ("drm/i915: Defer
>> transfer onto execution timeline to actual hw submission") the
>> global seqno advance was deferred until the submit_request callback.
>> After wedging the GPU, we were installing a nop_submit_request handler
>> (to avoid waking up the dead hw) but I had missed converting this over
>> to the new scheme. Under the new scheme, we have to explicitly call
>> i915_gem_submit_request() from the submit_request handler to mark the
>> request as on the hardware. If we don't the request is always pending,
>> and any waiter will continue to wait indefinitely and hangcheck will not
>> be able to resolve the lockup.
>>
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
>> Testcase: igt/gem_eio/in-flight
>> Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
>> ---
>>  drivers/gpu/drm/i915/i915_gem.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 7b9f5b99b0f3..7037a8b26903 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2762,6 +2762,8 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
>>
>>  static void nop_submit_request(struct drm_i915_gem_request *request)
>>  {
>> +	i915_gem_request_submit(request);
>> +	intel_engine_init_global_seqno(request->engine, request->global_seqno);
>
> Slight deja vu but not sure - we don't have a way of marking these as 
> failed so what happens in practice here? This as at the point of no 
> return, no replay, or allowing the context to recover or something?
>

I have another question. If we do this, why not
do it (also) when we prepare to start in the postfix of failed
request.

In another words, write the seqno with mmio, and start
replay not in the postfi,  but at the start of next whole request?

-Mika

>>  }
>>
>>  static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
>>
>
> Regards,
>
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx