On 27/07/2015 12:33, Tvrtko Ursulin wrote:
Hi,
On 07/17/2015 03:31 PM, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>
The intended usage model for struct fence is that the signalled
status should be
set on demand rather than polled. That is, there should not be a need
for a
'signaled' function to be called everytime the status is queried.
Instead,
'something' should be done to enable a signal callback from the
hardware which
will update the state directly. In the case of requests, this is the
seqno
update interrupt. The idea is that this callback will only be enabled
on demand
when something actually tries to wait on the fence.
This change removes the polling test and replaces it with the
callback scheme.
Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the
'poke me' list
when a new seqno pops out and signals any matching fence/request. The
fence is
then removed from the list so the entire request stack does not need
to be
scanned every time. Note that the fence is added to the list before
the commands
to generate the seqno interrupt are added to the ring. Thus the
sequence is
guaranteed to be race free if the interrupt is already enabled.
Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is
called). Thus there is still a potential race when enabling the
interrupt as the
request may already have completed. However, this is simply solved by
calling
the interrupt processing code immediately after enabling the
interrupt and
thereby checking for already completed requests.
Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will
never get
signalled and so must be removed from the signal list manually. This
is done by
setting a 'cancelled' flag and then calling the regular notify/retire
code path
rather than attempting to duplicate the list manipulatation and clean
up code in
multiple places. This also avoid any race condition where the
cancellation
request might occur after/during the completion interrupt actually
arriving.
v2: Updated to take advantage of the request unreference no longer
requiring the
mutex lock.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_drv.h | 8 ++
drivers/gpu/drm/i915/i915_gem.c | 132
+++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_irq.c | 2 +
drivers/gpu/drm/i915/intel_lrc.c | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
6 files changed, 136 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h
b/drivers/gpu/drm/i915/i915_drv.h
index 61c3db2..d7f1aa5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct
drm_i915_gem_object *old,
struct drm_i915_gem_request {
/** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+ struct list_head signal_list;
+ struct list_head unsignal_list;
In addition to what Daniel said (one list_head looks enough) it is
customary to call it _link.
struct list_head delay_free_list;
+ bool cancelled;
+ bool irq_enabled;
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct
intel_engine_cs *ring,
struct drm_i915_gem_request **req_out);
void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request
*req);
+void i915_gem_request_notify(struct intel_engine_cs *ring);
+
int i915_create_fence_timeline(struct drm_device *dev,
struct intel_context *ctx,
struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index 482835a..7c589a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,6 +1222,11 @@ int __i915_wait_request(struct
drm_i915_gem_request *req,
if (list_empty(&req->list))
return 0;
+ /*
+ * Enable interrupt completion of the request.
+ */
+ i915_gem_request_enable_interrupt(req);
+
if (i915_gem_request_completed(req))
return 0;
@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct
drm_i915_gem_request *request)
list_del_init(&request->list);
i915_gem_request_remove_from_client(request);
+ /* In case the request is still in the signal pending list */
+ if (!list_empty(&request->signal_list))
+ request->cancelled = true;
+
i915_gem_request_unreference(request);
}
@@ -2534,6 +2543,12 @@ void __i915_add_request(struct
drm_i915_gem_request *request,
*/
request->postfix = intel_ring_get_tail(ringbuf);
+ /*
+ * Add the fence to the pending list before emitting the
commands to
+ * generate a seqno notification interrupt.
+ */
+ i915_gem_request_submit(request);
+
if (i915.enable_execlists)
ret = ring->emit_request(request);
else {
@@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct
drm_i915_gem_request *req)
i915_gem_context_unreference(ctx);
}
+ if (req->irq_enabled)
+ req->ring->irq_put(req->ring);
+
We get here with interrupts still enabled only if userspace is
abandoning a wait on an unsignaled fence, did I get that right?
It implies the request has been abandoned in some manner, yes. E.g. TDR
has killed it, user space has given up, ...
kmem_cache_free(req->i915->requests, req);
}
@@ -2668,24 +2686,105 @@ static const char
*i915_gem_request_get_timeline_name(struct fence *req_fence)
return req->ring->name;
}
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request has been submitted to the hardware so add the fence
to the
+ * list of signalable fences.
+ *
+ * NB: This does not enable interrupts yet. That only occurs on
demand when
+ * the request is actually waited on. However, adding it to the list
early
+ * ensures that there is no race condition where the interrupt could
pop
+ * out prematurely and thus be completely lost. The race is merely
that the
+ * interrupt must be manually checked for after being enabled.
+ */
+void i915_gem_request_submit(struct drm_i915_gem_request *req)
{
- /* Interrupt driven fences are not implemented yet.*/
- WARN(true, "This should not be called!");
- return true;
+ fence_enable_sw_signaling(&req->fence);
}
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request
*req)
+{
+ if (req->irq_enabled)
+ return;
+
+ WARN_ON(!req->ring->irq_get(req->ring));
+ req->irq_enabled = true;
req->irq_enabled manipulations look racy. Here and in request free it
is protected by struct_mutex, but that is not held in
i915_gem_request_notify. Initial feeling is you should use
ring->fence_lock everyplace you query/manipulate req->irq_enabled.
The only asynchronous access is from _notify() which disables IRQs if
the flag is set and then clears it. That can't race with the enable
because the enable only sets the flag after setting IRQs on. The worst
that can happen on a race is that IRQs are enabled and then immediately
disabled - truly concurrent execution would result in one test or the
other failing and so only one code path would be taken. The only other
usage is in _request_free() but that can only run when the last
reference has been dropped and that means it is no longer on any list
that _notify() can see.
+
+ /*
+ * Because the interrupt is only enabled on demand, there is a race
+ * where the interrupt can fire before anyone is looking for it. So
+ * do an explicit check for missed interrupts.
+ */
+ i915_gem_request_notify(req->ring);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
{
struct drm_i915_gem_request *req = container_of(req_fence,
typeof(*req), fence);
+
+ i915_gem_request_reference(req);
+ WARN_ON(!list_empty(&req->signal_list));
It looks very unsafe to proceed normally after this WARN_ON. It should
probably return false here to preserve data structure sanity.
This really should be a BUG_ON but Daniel doesn't like those. It should
be an impossible code path and not something that can be hit by the user
being dumb. Anyway, this code has all been changed in the latest
incarnation.
+ list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
+
+ /*
+ * Note that signalling is always enabled for every request before
+ * that request is submitted to the hardware. Therefore there is
+ * no race condition whereby the signal could pop out before the
+ * request has been added to the list. Hence no need to check
+ * for completion, undo the list add and return false.
+ *
+ * NB: Interrupts are only enabled on demand. Thus there is still a
+ * race where the request could complete before the interrupt has
+ * been enabled. Thus care must be taken at that point.
+ */
+
+ return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring)
+{
+ struct drm_i915_gem_request *req, *req_next;
+ unsigned long flags;
u32 seqno;
+ LIST_HEAD(free_list);
- BUG_ON(req == NULL);
+ if (list_empty(&ring->fence_signal_list))
+ return;
+
+ seqno = ring->get_seqno(ring, false);
+
+ spin_lock_irqsave(&ring->fence_lock, flags);
+ list_for_each_entry_safe(req, req_next,
&ring->fence_signal_list, signal_list) {
+ if (!req->cancelled) {
+ if (!i915_seqno_passed(seqno, req->seqno))
+ continue;
- seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+ fence_signal_locked(&req->fence);
+ }
+
+ list_del_init(&req->signal_list);
I haven't managed to figure out why is this apparently removing
requests which have not been signalled from the signal_list?
Shouldn't they be moved to free_list only if i915_seqno_passed?
Requests are only removed from the signal list if either a) the seqno
has passed or b) the request has been cancelled and thus will never
actually complete. Not sure what other scenario you are seeing.
+ if (req->irq_enabled) {
+ req->ring->irq_put(req->ring);
+ req->irq_enabled = false;
+ }
- return i915_seqno_passed(seqno, req->seqno);
+ /* Can't unreference here because that might grab fence_lock */
+ list_add_tail(&req->unsignal_list, &free_list);
+ }
+ spin_unlock_irqrestore(&ring->fence_lock, flags);
+
+ /* It should now be safe to actually free the requests */
+ while (!list_empty(&free_list)) {
+ req = list_first_entry(&free_list,
+ struct drm_i915_gem_request, unsignal_list);
+ list_del(&req->unsignal_list);
+
+ i915_gem_request_unreference(req);
+ }
}
static void i915_fence_timeline_value_str(struct fence *fence, char
*str, int size)
@@ -2711,7 +2810,6 @@ static const struct fence_ops
i915_gem_request_fops = {
.get_driver_name = i915_gem_request_get_driver_name,
.get_timeline_name = i915_gem_request_get_timeline_name,
.enable_signaling = i915_gem_request_enable_signaling,
- .signaled = i915_gem_request_is_completed,
.wait = fence_default_wait,
.release = i915_gem_request_release,
.fence_value_str = i915_fence_value_str,
@@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct
intel_engine_cs *ring,
goto err;
}
+ INIT_LIST_HEAD(&req->signal_list);
fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
ctx->engine[ring->id].fence_timeline.fence_context,
i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct
drm_i915_private *dev_priv,
i915_gem_request_retire(request);
}
+
+ /*
+ * Make sure any requests that were on the signal pending list get
+ * cleaned up.
+ */
+ i915_gem_request_notify(ring);
+ i915_gem_retire_requests_ring(ring);
Would i915_gem_retire_requests_ring be enough given how it calls
i915_gem_request_notify itself as the first thing below?
Oops, left over from before retire called notify explicitly.
}
void i915_gem_restore_fences(struct drm_device *dev)
@@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct
intel_engine_cs *ring)
{
WARN_ON(i915_verify_lists(ring->dev));
+ /*
+ * If no-one has waited on a request recently then interrupts will
+ * not have been enabled and thus no requests will ever be
marked as
+ * completed. So do an interrupt check now.
+ */
+ i915_gem_request_notify(ring);
+
/* Retire requests first as we use it above for the early return.
* If we retire requests last, we may use a later seqno and so
clear
* the requests lists without clearing the active list, leading to
@@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
{
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
+ INIT_LIST_HEAD(&ring->fence_signal_list);
INIT_LIST_HEAD(&ring->delayed_free_list);
}
diff --git a/drivers/gpu/drm/i915/i915_irq.c
b/drivers/gpu/drm/i915/i915_irq.c
index d87f173..e446509 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs
*ring)
trace_i915_gem_request_notify(ring);
+ i915_gem_request_notify(ring);
+
How many requests are typically on signal_list on some typical
workloads? This could be a significant performance change since on
every user interrupt it would talk it all potentially only removing
one request at a time.
Obviously, some of the IGT tests can produce very large request lists
(e.g. gem_exec_nop) but running 'normal' stuff rarely seems to generate
a long list. E.g. running GLBench + GLXGears on an Ubuntu desktop I get
95% of the time there are ten or fewer requests in the list but the loop
iterates only once (49%) or twice (46%) because the first request gets
signalled and the second (if present) aborts the loop.
The biggest problem seems to be that the hardware is brain dead with
respect to generating interrupts. So if two requests complete in quick
succession and the ISR only gets to see the second seqno, it still gets
called a second time. Thus we actually get a ridiculous figure of 60% of
ISR calls being no-ops because the seqno has not actually advanced. The
code now checks for duplicate seqnos and early exits. Not sure how to
get rid of the call completely.
These are just review comments on this particular patch without
thinking yet of the bigger design questions Daniel has raised.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx