Re: [PATCH v4 4/4] drm/i915: Fix premature LRC unpin in GuC mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 21/01/16 12:32, Chris Wilson wrote:
On Thu, Jan 21, 2016 at 12:14:10PM +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

In GuC mode LRC pinning lifetime depends exclusively on the
request liftime. Since that is terminated by the seqno update
that opens up a race condition between GPU finishing writing
out the context image and the driver unpinning the LRC.

To extend the LRC lifetime we will employ a similar approach
to what legacy ringbuffer submission does.

We will start tracking the last submitted context per engine
and keep it pinned until it is replaced by another one.

Note that the driver unload path is a bit fragile and could
benefit greatly from efforts to unify the legacy and exec
list submission code paths.

At the moment i915_gem_context_fini has special casing for the
two which are potentialy not needed, and also depends on
i915_gem_cleanup_ringbuffer running before itself.

v2:
  * Move pinning into engine->emit_request and actually fix
    the reference/unreference logic. (Chris Wilson)

  * ring->dev can be NULL on driver unload so use a different
    route towards it.

v3:
  * Rebase.
  * Handle the reset path. (Chris Wilson)
  * Exclude default context from the pinning - it is impossible
    to get it right before default context special casing in
    general is eliminated.

v4:
  * Rebased & moved context tracking to
    intel_logical_ring_advance_and_submit.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Issue: VIZ-4277
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Nick Hoath <nicholas.hoath@xxxxxxxxx>

Whilst it saddens me to see yet another (impossible) special case added
that will just have to be deleted again, the series is
Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Thanks and sorry, hopefully it will get cleanup up soon. There seems to be a growing number of people who want it done.

And I still need to get back to your VMA rewrite and breadcrumbs would be nice as well.

I wonder if it is possible to poison the context objects before and
after, then do a deferred check for stray writes, and use that mode for
igt/gem_ctx_* (with some tests targetting active->idle vs
context-close). Would still be susceptible to timing as we need to
hit the interval between the seqno being complete and the delayed context
save, but that seems like the most reliable way to detect the error?

First it needs to be tested with GuC to check that it actually fixes the issue. And pass CI of course.

But I can't really figure where would you put this poisoning? You could put something in in exec list mode after context complete and check it before it is used next time, but I did not think we can hit this in exec list mode, only in GuC. You think it is possible?

And in GuC mode I have no idea at which point you would put "poisoning" in?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux