On Wed, Feb 22, 2017 at 04:53:35PM +0000, Tvrtko Ursulin wrote: > > On 17/02/2017 15:51, Chris Wilson wrote: > >@@ -4034,9 +4038,8 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req) > > * is woken. > > */ > > if (engine->irq_seqno_barrier && > >- rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current && > > test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) { > >- struct task_struct *tsk; > >+ unsigned long flags; > > > > /* The ordering of irq_posted versus applying the barrier > > * is crucial. The clearing of the current irq_posted must > >@@ -4058,17 +4061,17 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req) > > * the seqno before we believe it coherent since they see > > * irq_posted == false but we are still running). > > */ > >- rcu_read_lock(); > >- tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh); > >- if (tsk && tsk != current) > >+ spin_lock_irqsave(&engine->breadcrumbs.lock, flags); > >+ if (engine->breadcrumbs.first_wait && > >+ engine->breadcrumbs.first_wait->tsk != current) > > /* Note that if the bottom-half is changed as we > > * are sending the wake-up, the new bottom-half will > > * be woken by whomever made the change. We only have > > * to worry about when we steal the irq-posted for > > * ourself. > > */ > >- wake_up_process(tsk); > >- rcu_read_unlock(); > >+ wake_up_process(engine->breadcrumbs.first_wait->tsk); > >+ spin_unlock_irqrestore(&engine->breadcrumbs.lock, flags); > > Worth caching &engine->breadcrumbs maybe? I'll ask gcc. > > > > if (__i915_gem_request_completed(req, seqno)) > > return true; > >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > >index e22eacec022d..2e7bdb0cf069 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_request.c > >+++ b/drivers/gpu/drm/i915/i915_gem_request.c > >@@ -1083,6 +1083,7 @@ long i915_wait_request(struct drm_i915_gem_request *req, > > } > > > > wait.tsk = current; > >+ wait.request = req; > > I guess this was the preemption prep tree already. If you decide to > keep the intel_wait_init helper could add req to it. Challenge being not all users of intel_wait_init have a request :) > > restart: > > do { > >diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h > >index 5f73d8c0a38a..0efee879df23 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_request.h > >+++ b/drivers/gpu/drm/i915/i915_gem_request.h > >@@ -32,10 +32,12 @@ > > > > struct drm_file; > > struct drm_i915_gem_object; > >+struct drm_i915_gem_request; > > > > struct intel_wait { > > struct rb_node node; > > struct task_struct *tsk; > >+ struct drm_i915_gem_request *request; > > u32 seqno; > > }; > > > >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > >index 57fa1bf78a85..0c370c687c2a 100644 > >--- a/drivers/gpu/drm/i915/i915_irq.c > >+++ b/drivers/gpu/drm/i915/i915_irq.c > >@@ -1033,10 +1033,44 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv) > > > > static void notify_ring(struct intel_engine_cs *engine) > > { > >+ struct drm_i915_gem_request *rq = NULL; > >+ struct intel_wait *wait; > >+ > >+ if (!intel_engine_has_waiter(engine)) > >+ return; > >+ > >+ trace_i915_gem_request_notify(engine); > > atomic_inc(&engine->irq_count); > > set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted); > >- if (intel_engine_wakeup(engine)) > >- trace_i915_gem_request_notify(engine); > >+ > >+ rcu_read_lock(); > > Not sure this is required from an irq handler. But I don't think it > harms either, so maybe has an usefulness in documenting things. That's what I said! It's not, but it's cheap enough that I think it is worth keeping to document the game being played with rq. > Looks OK. > > Last time I tested it was a definitive latency improvement but also > a slight throughput regression on crazy microbenchmarks like > gem_latency IIRC. But I think overall a good thing for more > realistic workloads. Yup. It's a big one for gem_exec_whisper which is latency sensitive (and gem_exec_latency). I didn't see gem_latency (using tests/gem_latency) suffer too much. To be precise, which gem_latency are you thinking of? :) I presume ezbench -b gem:latency ? I did look at trying to reduce the cost of the spinlocks (by trying to share them between requests, or the engine->timeline and breadcrumbs) but came away bruised and battered. > I leave to you the ordering wrt/ the preemption prep series. I can > apply my r-b at that point, not sure if it makes sense now? It's in the queue, and I'll resend it once the earlier patches land so that CI has a run over it, and to catch any more ideas. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx