Re: [RFC 7/9] drm/i915: Interrupt driven fences

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 27 Jul 2015 14:20:43 +0100





On 07/17/2015 03:31 PM, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

The intended usage model for struct fence is that the signalled status should be
set on demand rather than polled. That is, there should not be a need for a
'signaled' function to be called everytime the status is queried. Instead,
'something' should be done to enable a signal callback from the hardware which
will update the state directly. In the case of requests, this is the seqno
update interrupt. The idea is that this callback will only be enabled on demand
when something actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback scheme.
Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke me' list
when a new seqno pops out and signals any matching fence/request. The fence is
then removed from the list so the entire request stack does not need to be
scanned every time. Note that the fence is added to the list before the commands
to generate the seqno interrupt are added to the ring. Thus the sequence is
guaranteed to be race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
called). Thus there is still a potential race when enabling the interrupt as the
request may already have completed. However, this is simply solved by calling
the interrupt processing code immediately after enabling the interrupt and
thereby checking for already completed requests.

Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will never get
signalled and so must be removed from the signal list manually. This is done by
setting a 'cancelled' flag and then calling the regular notify/retire code path
rather than attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the cancellation
request might occur after/during the completion interrupt actually arriving.

v2: Updated to take advantage of the request unreference no longer requiring the
mutex lock.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
---

[snip]

@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
  	list_del_init(&request->list);
  	i915_gem_request_remove_from_client(request);

+	/* In case the request is still in the signal pending list */
+	if (!list_empty(&request->signal_list))
+		request->cancelled = true;
+

Another thing I did not see implemented is the sync_fence error state.

This is more about the Android part, but related to this canceled flag 
so I am commenting here.

I thought when TDR kicks in and we set request->cancelled to true, there 
should be a code path which somehow makes sync_fence->status negative.

As it is, because fence_signal will not be called on canceled, I thought 
waiters will wait until timeout, rather than being woken up and return 
error status.

For this to work you would somehow need to make sync_fence->status go 
negative. With normal fence completion it goes from 1 -> 0, via the 
completion callback. I did not immediately see how to make it go 
negative using the existing API.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx