On Mon, Nov 30, 2015 at 01:32:16PM +0000, Tvrtko Ursulin wrote: > > On 30/11/15 12:30, Chris Wilson wrote: > >On Mon, Nov 30, 2015 at 12:05:50PM +0000, Tvrtko Ursulin wrote: > >>>+ struct intel_breadcrumbs { > >>>+ spinlock_t lock; /* protects the per-engine request list */ > >> > >>s/list/tree/ > > > >list. We use the tree as a list! > > Maybe but it confuses the reader a bit who goes and looks for some > list then. > > Related, when I was playing with this I was piggy backing on the > existing per engine request list. A separate list with separate locking only used when waiting, that's the appeal for me. > If they are seqno ordered there, and I think they are, then the > waker thread just traverses it in order until hitting the not > completed one, waking up every req->wait_queue as it goes. > > In this case it doesn't matter that the waiters come in in > indeterministic order so we don't need a (new) tree. > > But the downside is the thread needs struct mutex. And that the > retirement from that list is so lazy.. And we can't even take struct_mutex! > > >>>+ struct task_struct *task; /* bh for all user interrupts */ > >>>+ struct intel_breadcrumbs_engine { > >>>+ struct rb_root requests; /* sorted by retirement */ > >>>+ struct drm_i915_gem_request *first; > >> > >>/* First request in the tree to be completed next by the hw. */ > >> > >>? > > > >/* first */ ? > > Funny? :) First what? Why make the reader reverse engineer it? struct drm_i915_gem_request *first_waiter; ? Please don't make me add /* ^^^ */ > >>For a later extension: > >> > >>How would you view, some time in the future, adding a bool return to > >>ring->put_irq() which would say to the thread that it failed to > >>disable interrupts? > >> > >>In this case thread would exit and would need to be restarted when > >>the next waiter queues up. Possibly extending the > >>idle->put_irq->exit durations as well then. > > > >Honestly, not keen on the idea that the lowlevel put_irq can fail (makes > >cleanup much more fraught). I don't see what improvement you intend > >here, maybe clarifying that would help explain what you mean. > > Don't know, maybe reflecting the fact it wasn't the last put_irq > call so let the caller handle it if they want. We can leave this > discussion for the future. Hmm. The only other use is the trace_irq. It would be nice to eliminate that and the ring->irq_count. > > >>>@@ -2986,16 +2981,16 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > >>> if (ring_idle(ring, seqno)) { > >>> ring->hangcheck.action = HANGCHECK_IDLE; > >>> > >>>- if (waitqueue_active(&ring->irq_queue)) { > >>>+ if (READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first)) { > >> > >>READ_ONCE is again strange, but other than that I don't know the > >>hangcheck code to really evaulate it. > >> > >>Perhaps this also means this line needs a comment describing what > >>condition/state is actually approximating with the check. > > > >All we are doing is asking if anyone is waiting on the GPU, because the > >GPU has finished all of its work. If so, the IRQ handling is suspect > >(modulo the pre-existing race condition clearly earmarked by READ_ONCE). > > Cool, /* Check if someone is waiting on the GPU */ then would make me happy. +static bool any_waiters_on_engine(struct intel_engine_cs *engine) +{ + return READ_ONCE(engine->i915->breadcrumbs.engine[engine->id].first_waiter); +} + static bool any_waiters(struct drm_i915_private *dev_priv) { struct intel_engine_cs *ring; int i; for_each_ring(ring, dev_priv, i) - if (ring->irq_refcount) + if (any_waiters_on_engine(engine)) return true; return false; > >>>+ while (!kthread_should_stop()) { > >>>+ unsigned reset_counter = i915_reset_counter(&i915->gpu_error); > >>>+ unsigned long timeout_jiffies; > >>>+ bool idling = false; > >>>+ > >>>+ /* On every tick, walk the seqno-ordered list of requests > >>>+ * and for each retired request wakeup its clients. If we > >>>+ * find an unfinished request, go back to sleep. > >>>+ */ > >> > >>s/tick/loop/ ? > >s/tick/iteration/ I'm sticking with tick > > Tick makes me tick of a scheduler tick so to me it is the worst of > the three. Iteration sounds really good. Whether that will be a > tick/jiffie/orsomething is definite a bit lower in the code. Hmm, that was the connection I liked with tick. > >>And if we don't find and unfinished request still go back to sleep. :) > > > >Ok. > > > >>>+ set_current_state(TASK_INTERRUPTIBLE); > >>>+ > >>>+ /* Note carefully that we do not hold a reference for the > >>>+ * requests on this list. Therefore we only inspect them > >>>+ * whilst holding the spinlock to ensure that they are not > >>>+ * freed in the meantime, and the client must remove the > >>>+ * request from the list if it is interrupted (before it > >>>+ * itself releases its reference). > >>>+ */ > >>>+ spin_lock(&b->lock); > >> > >>This lock is more per engine that global in its nature unless you > >>think it was more efficient to do fewer lock atomics vs potential > >>overlap of waiter activity per engines? > > > >Indeed. I decided not to care about making it per-engine. > > > >>Would need a new lock for req->irq_count, per request. So > >>req->lock/req->irq_count and be->lock for the tree operations. > > > >Why? A request is tied to an individual engine, therefore we can use > >that breadcrumb_engine.lock for the request. > > Just so 2nd+ waiters would not touch the per engine lock. Ah. I am not expecting that much contention and even fewer multiple waiters for a request. > >>>+ for (i = 0; i < I915_NUM_RINGS; i++) { > >>>+ struct intel_engine_cs *engine = &i915->ring[i]; > >>>+ struct intel_breadcrumbs_engine *be = &b->engine[i]; > >>>+ struct drm_i915_gem_request *request = be->first; > >>>+ > >>>+ if (request == NULL) { > >>>+ if ((irq_get & (1 << i))) { > >>>+ if (irq_enabled & (1 << i)) { > >>>+ engine->irq_put(engine); > >>>+ irq_enabled &= ~ (1 << i); > >>>+ } > >>>+ intel_runtime_pm_put(i915); > >>>+ irq_get &= ~(1 << i); > >>>+ } > >>>+ continue; > >>>+ } > >>>+ > >>>+ if ((irq_get & (1 << i)) == 0) { > >>>+ intel_runtime_pm_get(i915); > >>>+ irq_enabled |= __irq_enable(engine) << i; > >>>+ irq_get |= 1 << i; > >>>+ } > >> > >>irq_get and irq_enabled are creating a little bit of a headache for > >>me. And that would probably mean a comment is be in order to explain > >>what they are for and how they work. > >> > >>Like this I don't see the need for irq_get to persist across loops? > > > >Because getting the irq is quite slow, we add a jiffie shadow. > > But it is not a shadow in the sense of the same bitmask from the > previous iteration. The bits are different depending on special > cases in __irq_enable. > > So it needs some comments I think. Didn't you get that from idling and its comments? First, how about engines_active and irqs_enabled? ... /* When enabling waiting upon an engine, we need * to unmask the user interrupt, and before we * do so we need to hold a wakelock for our * hardware access (and the interrupts thereafter). */ if ((engines_active & (1 << i)) == 0) { ... if (request == NULL) { /* No more waiters, mask the user interrupt * (if we were able to enable it!) and * release the wakelock. */ if (engines_active & (1 << i)) { with a bit of work, I can then get the irq and rpm references out of the spinlock. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx