Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 13/06/16 16:51, John Harrison wrote:

[snip]

diff --git a/drivers/gpu/drm/i915/i915_dma.c
b/drivers/gpu/drm/i915/i915_dma.c
index 07edaed..f8f60bb 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct
drm_i915_private *dev_priv)
       if (dev_priv->wq == NULL)
           goto out_err;

+    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
+    if (dev_priv->req_wq == NULL)
+        goto out_free_wq;
+
Single (per-device) ordered workqueue will serialize interrupt
processing across all engines to one thread. Together with the fact
request worker does not seem to need the sleeping context, I am
thinking that a tasklet per engine would be much better (see
engine->irq_tasklet for an example).
Other conversations have stated that tasklets are not the best option. I
did think about having a work queue per engine but that seemed
excessive. Plus any subsequent work triggered by the fence completion
will almost certainly require grabbing the driver mutex lock (because
everything requires the mutex lock) so serialisation of engines doesn't
sound like much of an issue.

In this patch AFAICS i915_gem_request_worker calls i915_gem_request_notify on the engine which only holds the engine->fence_lock. So if you had a per engine wq all engines could do that processing in parallel.

It is only a matter of when fence_signal_locked wakes up the waiters that they might go and hammer on struct_mutex, but for any work they want to do between waking up and submitting new work, serializing via a single wq will be bad for latency.


+        ret = __i915_spin_request(req, state);
+        if (ret == 0)
+            goto out;
       }

+    /*
+     * Enable interrupt completion of the request.
+     */
+    fence_enable_sw_signaling(&req->fence);
+
       for (;;) {
           struct timer_list timer;

@@ -1306,6 +1306,21 @@ int __i915_wait_request(struct
drm_i915_gem_request *req,
               break;
           }

+        if (req->seqno) {
+            /*
+             * There is quite a lot of latency in the user interrupt
+             * path. So do an explicit seqno check and potentially
+             * remove all that delay.
+             */
+            if (req->engine->irq_seqno_barrier)
+                req->engine->irq_seqno_barrier(req->engine);
+            seqno = engine->get_seqno(engine);
+            if (i915_seqno_passed(seqno, req->seqno)) {
+                ret = 0;
+                break;
+            }
+        }
+
           if (signal_pending_state(state, current)) {
               ret = -ERESTARTSYS;
               break;
@@ -1332,14 +1347,32 @@ int __i915_wait_request(struct
drm_i915_gem_request *req,
               destroy_timer_on_stack(&timer);
           }
       }
-    if (!irq_test_in_progress)
-        engine->irq_put(engine);

       finish_wait(&engine->irq_queue, &wait);
Hm I don't understand why our custom waiting remains? Shouldn't
fence_wait just be called after the optimistic spin, more or less?
That would solve the 'thundering problem' if we could. In theory, the
entire wait function should just be a call to 'fence_wait(&req->fence)'.
Unfortunately, the wait function goes to sleep holding the mutex lock
and requires having a bail out option on the wait which is not currently
part of the fence API. I have a work-in-progress patch that almost
solves the issues but it isn't quite there yet (and I haven't had much
chance to work on it for a while).

[Btw does it also mean i915 can not handle the incoming 3rd party fences?]

So the userspace waiters can either wait on a dma-buf or i915 wq, depending on what ioctl they have called. Then on interrupt i915 waiters are woken directly, while the fence api ones go via a worker.

i915 waiters also do a second wake up the fence API waiters by...


   out:
       trace_i915_gem_request_wait_end(req);

+    if ((ret == 0) && (req->seqno)) {
+        if (req->engine->irq_seqno_barrier)
+            req->engine->irq_seqno_barrier(req->engine);
+        seqno = engine->get_seqno(engine);
+        if (i915_seqno_passed(seqno, req->seqno) &&
+            !i915_gem_request_completed(req)) {
+            /*
+             * Make sure the request is marked as completed before
+             * returning. NB: Need to acquire the spinlock around
+             * the whole call to avoid a race condition with the
+             * interrupt handler is running concurrently and could
+             * cause this invocation to early exit even though the
+             * request has not actually been fully processed yet.
+             */
+            spin_lock_irq(&req->engine->fence_lock);
+            i915_gem_request_notify(req->engine, true);

... this call here.

If there are both classes of waiters one of those calls will be a waste of cycles (spin lock, irq toggle, list iter, coherent seqno read) correct?

+            spin_unlock_irq(&req->engine->fence_lock);
+        }
+    }
+
       if (timeout) {
           s64 tres = *timeout - (ktime_get_raw_ns() - before);

@@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct
drm_i915_gem_request *request)
   {
       trace_i915_gem_request_retire(request);

+    if (request->irq_enabled) {
+        request->engine->irq_put(request->engine);
+        request->irq_enabled = false;
What protects request->irq_enabled? Here versus enable_signalling
bit? It can be called from the external fence users which would take
the fence_lock, but here it does not.
The flag can only be set when enabling interrupt driven completion
(which can only happen once and only if the fence is not already
signalled). The flag can only be cleared when the fence is signalled or
when the request is retired. And retire without signal can only happen
if the request is being cancelled in some way (e.g. GPU reset) and thus
will not ever be signalled. So if we get here then none of the other
paths are possible anymore.

Couldn't retire without signal happen via execbuf paths calling i915_gem_retire_requests_ring?

+        i915_gem_request_enable_interrupt(req, true);
+
+    return true;
+}
+
+/**
+ * i915_gem_request_worker - request work handler callback.
+ * @work: Work structure
+ * Called in response to a seqno interrupt to process the completed
requests.
+ */
+void i915_gem_request_worker(struct work_struct *work)
+{
+    struct intel_engine_cs *engine;
+
+    engine = container_of(work, struct intel_engine_cs, request_work);
+    i915_gem_request_notify(engine, false);
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *engine, bool
fence_locked)
+{
+    struct drm_i915_gem_request *req, *req_next;
+    unsigned long flags;
       u32 seqno;

-    seqno = req->engine->get_seqno(req->engine);
+    if (list_empty(&engine->fence_signal_list))
Okay this without the lock still makes me nervous. I'd rather not
having to think about why it is safe and can't miss a wakeup.
I don't see how list_empty() can return a false negative. Even if the
implementation was such that it could see a partially updated state
across multiple memory accesses, that will just lead to it thinking
not-empty which is fine. Any update which takes it from empty to
not-empty is guaranteed to occur before the act of enabling interrupts
and thus before notify() can be called. So while it could potentially do
the full processing when an early exit was fine, it can never early exit
when it needs to do something.

Something like that sounds like a good comment to put above then! :)

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux