Op 12-05-16 om 23:06 schreef John.C.Harrison@xxxxxxxxx: > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > The intended usage model for struct fence is that the signalled status > should be set on demand rather than polled. That is, there should not > be a need for a 'signaled' function to be called everytime the status > is queried. Instead, 'something' should be done to enable a signal > callback from the hardware which will update the state directly. In > the case of requests, this is the seqno update interrupt. The idea is > that this callback will only be enabled on demand when something > actually tries to wait on the fence. > > This change removes the polling test and replaces it with the callback > scheme. Each fence is added to a 'please poke me' list at the start of > i915_add_request(). The interrupt handler then scans through the 'poke > me' list when a new seqno pops out and signals any matching > fence/request. The fence is then removed from the list so the entire > request stack does not need to be scanned every time. Note that the > fence is added to the list before the commands to generate the seqno > interrupt are added to the ring. Thus the sequence is guaranteed to be > race free if the interrupt is already enabled. > > Note that the interrupt is only enabled on demand (i.e. when > __wait_request() is called). Thus there is still a potential race when > enabling the interrupt as the request may already have completed. > However, this is simply solved by calling the interrupt processing > code immediately after enabling the interrupt and thereby checking for > already completed requests. > > Lastly, the ring clean up code has the possibility to cancel > outstanding requests (e.g. because TDR has reset the ring). These > requests will never get signalled and so must be removed from the > signal list manually. This is done by setting a 'cancelled' flag and > then calling the regular notify/retire code path rather than > attempting to duplicate the list manipulatation and clean up code in > multiple places. This also avoid any race condition where the > cancellation request might occur after/during the completion interrupt > actually arriving. > > v2: Updated to take advantage of the request unreference no longer > requiring the mutex lock. > > v3: Move the signal list processing around to prevent unsubmitted > requests being added to the list. This was occurring on Android > because the native sync implementation calls the > fence->enable_signalling API immediately on fence creation. > > Updated after review comments by Tvrtko Ursulin. Renamed list nodes to > 'link' instead of 'list'. Added support for returning an error code on > a cancelled fence. Update list processing to be more efficient/safer > with respect to spinlocks. > > v5: Made i915_gem_request_submit a static as it is only ever called > from one place. > > Fixed up the low latency wait optimisation. The time delay between the > seqno value being to memory and the drive's ISR running can be > significant, at least for the wait request micro-benchmark. This can > be greatly improved by explicitly checking for seqno updates in the > pre-wait busy poll loop. Also added some documentation comments to the > busy poll code. > > Fixed up support for the faking of lost interrupts > (test_irq_rings/missed_irq_rings). That is, there is an IGT test that > tells the driver to loose interrupts deliberately and then check that > everything still works as expected (albeit much slower). > > Updates from review comments: use non IRQ-save spinlocking, early exit > on WARN and improved comments (Tvrtko Ursulin). > > v6: Updated to newer nigthly and resolved conflicts around the > wait_request busy spin optimisation. Also fixed a race condition > between this early exit path and the regular completion path. > > v7: Updated to newer nightly - lots of ring -> engine renaming plus an > interface change on get_seqno(). Also added a list_empty() check > before acquring spinlocks and doing list processing. > > v8: Updated to newer nightly - changes to request clean up code mean > non of the deferred free mess is needed any more. > > For: VIZ-5190 > Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.h | 7 + > drivers/gpu/drm/i915/i915_gem.c | 235 +++++++++++++++++++++++++++++--- > drivers/gpu/drm/i915/i915_irq.c | 2 + > drivers/gpu/drm/i915/intel_lrc.c | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.c | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > 6 files changed, 225 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 433d99a..3adac59 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request { > */ > struct fence fence; > struct rcu_head rcu_head; > + struct list_head signal_link; > + bool cancelled; > + bool irq_enabled; > + bool signal_requested; > > /** On Which ring this request was generated */ > struct drm_i915_private *i915; > @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request { > struct drm_i915_gem_request * __must_check > i915_gem_request_alloc(struct intel_engine_cs *engine, > struct intel_context *ctx); > +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, > + bool fence_locked); > +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked); > > static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) > { > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 958edcb..6669508 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -39,6 +39,8 @@ > #include <linux/pci.h> > #include <linux/dma-buf.h> > > +static void i915_gem_request_submit(struct drm_i915_gem_request *req); > + > static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj); > static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj); > static void > @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > struct intel_engine_cs *engine = i915_gem_request_get_engine(req); > struct drm_device *dev = engine->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > - const bool irq_test_in_progress = > - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine); > int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; > + uint32_t seqno; > DEFINE_WAIT(wait); > unsigned long timeout_expire; > s64 before = 0; /* Only to silence a compiler warning. */ > @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > > WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); > > - if (list_empty(&req->list)) > - return 0; > - > if (i915_gem_request_completed(req)) > return 0; > > @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > trace_i915_gem_request_wait_begin(req); > > /* Optimistic spin for the next jiffie before touching IRQs */ > - ret = __i915_spin_request(req, state); > - if (ret == 0) > - goto out; > - > - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) { > - ret = -ENODEV; > - goto out; > + if (req->seqno) { > + ret = __i915_spin_request(req, state); > + if (ret == 0) > + goto out; > } > > + /* > + * Enable interrupt completion of the request. > + */ > + fence_enable_sw_signaling(&req->fence); > + > for (;;) { > struct timer_list timer; > > @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > break; > } > > + if (req->seqno) { > + /* > + * There is quite a lot of latency in the user interrupt > + * path. So do an explicit seqno check and potentially > + * remove all that delay. > + */ > + if (req->engine->irq_seqno_barrier) > + req->engine->irq_seqno_barrier(req->engine); > + seqno = engine->get_seqno(engine); > + if (i915_seqno_passed(seqno, req->seqno)) { > + ret = 0; > + break; > + } > + } > + > if (signal_pending_state(state, current)) { > ret = -ERESTARTSYS; > break; > @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > destroy_timer_on_stack(&timer); > } > } > - if (!irq_test_in_progress) > - engine->irq_put(engine); > > finish_wait(&engine->irq_queue, &wait); > > out: > trace_i915_gem_request_wait_end(req); > > + if ((ret == 0) && (req->seqno)) { > + if (req->engine->irq_seqno_barrier) > + req->engine->irq_seqno_barrier(req->engine); > + seqno = engine->get_seqno(engine); > + if (i915_seqno_passed(seqno, req->seqno) && > + !i915_gem_request_completed(req)) { > + /* > + * Make sure the request is marked as completed before > + * returning. NB: Need to acquire the spinlock around > + * the whole call to avoid a race condition with the > + * interrupt handler is running concurrently and could > + * cause this invocation to early exit even though the > + * request has not actually been fully processed yet. > + */ > + spin_lock_irq(&req->engine->fence_lock); > + i915_gem_request_notify(req->engine, true); > + spin_unlock_irq(&req->engine->fence_lock); > + } > + } > + > if (timeout) { > s64 tres = *timeout - (ktime_get_raw_ns() - before); > > @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) > { > trace_i915_gem_request_retire(request); > > + if (request->irq_enabled) { > + request->engine->irq_put(request->engine); > + request->irq_enabled = false; > + } > + > /* We know the GPU must have read the request to have > * sent us the seqno + interrupt, so use the position > * of tail of the request to update the last known position > @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) > list_del_init(&request->list); > i915_gem_request_remove_from_client(request); > > + /* > + * In case the request is still in the signal pending list, > + * e.g. due to being cancelled by TDR, preemption, etc. > + */ > + if (!list_empty(&request->signal_link)) { > + /* > + * The request must be marked as cancelled and the underlying > + * fence as failed. NB: There is no explicit fence fail API, > + * there is only a manual poke and signal. > + */ > + request->cancelled = true; > + /* How to propagate to any associated sync_fence??? */ > + request->fence.status = -EIO; > + fence_signal_locked(&request->fence); > + } > + > if (request->previous_context) { > if (i915.enable_execlists) > intel_lr_context_unpin(request->previous_context, > @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, > */ > request->postfix = intel_ring_get_tail(ringbuf); > > + /* > + * Add the fence to the pending list before emitting the commands to > + * generate a seqno notification interrupt. > + */ > + i915_gem_request_submit(request); > + > if (i915.enable_execlists) > ret = engine->emit_request(request); > else { > @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence) > struct drm_i915_gem_request *req; > > req = container_of(req_fence, typeof(*req), fence); > + > + WARN_ON(req->irq_enabled); > + > call_rcu(&req->rcu_head, i915_gem_request_free_rcu); > } > > -static bool i915_gem_request_enable_signaling(struct fence *req_fence) > +/* > + * The request is about to be submitted to the hardware so add the fence to > + * the list of signalable fences. > + * > + * NB: This does not necessarily enable interrupts yet. That only occurs on > + * demand when the request is actually waited on. However, adding it to the > + * list early ensures that there is no race condition where the interrupt > + * could pop out prematurely and thus be completely lost. The race is merely > + * that the interrupt must be manually checked for after being enabled. > + */ > +static void i915_gem_request_submit(struct drm_i915_gem_request *req) > { > - /* Interrupt driven fences are not implemented yet.*/ > - WARN(true, "This should not be called!"); > - return true; > + /* > + * Always enable signal processing for the request's fence object > + * before that request is submitted to the hardware. Thus there is no > + * race condition whereby the interrupt could pop out before the > + * request has been added to the signal list. Hence no need to check > + * for completion, undo the list add and return false. > + */ > + i915_gem_request_reference(req); > + spin_lock_irq(&req->engine->fence_lock); > + WARN_ON(!list_empty(&req->signal_link)); > + list_add_tail(&req->signal_link, &req->engine->fence_signal_list); > + spin_unlock_irq(&req->engine->fence_lock); > + > + /* > + * NB: Interrupts are only enabled on demand. Thus there is still a > + * race where the request could complete before the interrupt has > + * been enabled. Thus care must be taken at that point. > + */ > + > + /* Have interrupts already been requested? */ > + if (req->signal_requested) > + i915_gem_request_enable_interrupt(req, false); > +} > + > +/* > + * The request is being actively waited on, so enable interrupt based > + * completion signalling. > + */ > +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, > + bool fence_locked) > +{ > + struct drm_i915_private *dev_priv = to_i915(req->engine->dev); > + const bool irq_test_in_progress = > + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & > + intel_engine_flag(req->engine); > + > + if (req->irq_enabled) > + return; > + > + if (irq_test_in_progress) > + return; > + > + if (!WARN_ON(!req->engine->irq_get(req->engine))) > + req->irq_enabled = true; > + > + /* > + * Because the interrupt is only enabled on demand, there is a race > + * where the interrupt can fire before anyone is looking for it. So > + * do an explicit check for missed interrupts. > + */ > + i915_gem_request_notify(req->engine, fence_locked); > } > > -static bool i915_gem_request_is_completed(struct fence *req_fence) > +static bool i915_gem_request_enable_signaling(struct fence *req_fence) > { > struct drm_i915_gem_request *req = container_of(req_fence, > typeof(*req), fence); > + > + /* > + * No need to actually enable interrupt based processing until the > + * request has been submitted to the hardware. At which point > + * 'i915_gem_request_submit()' is called. So only really enable > + * signalling in there. Just set a flag to say that interrupts are > + * wanted when the request is eventually submitted. On the other hand > + * if the request has already been submitted then interrupts do need > + * to be enabled now. > + */ > + > + req->signal_requested = true; > + > + if (!list_empty(&req->signal_link)) > + i915_gem_request_enable_interrupt(req, true); > + > + return true; > +} > + > +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) > +{ > + struct drm_i915_gem_request *req, *req_next; > + unsigned long flags; > u32 seqno; > > - seqno = req->engine->get_seqno(req->engine); > + if (list_empty(&engine->fence_signal_list)) > + return; > + If you don't hold the lock then you should probably use list_empty_careful. ~Maarten _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx