On Fri, Nov 27, 2015 at 01:53:34PM +0000, Tvrtko Ursulin wrote: > > On 27/11/15 13:01, Chris Wilson wrote: > >On Fri, Nov 27, 2015 at 12:11:32PM +0000, Tvrtko Ursulin wrote: > >> > >>Hi, > >> > >>On 31/10/15 10:34, Chris Wilson wrote: > >>>One particularly stressful scenario consists of many independent tasks > >>>all competing for GPU time and waiting upon the results (e.g. realtime > >>>transcoding of many, many streams). One bottleneck in particular is that > >>>each client waits on its own results, but every client is woken up after > >>>every batchbuffer - hence the thunder of hooves as then every client must > >>>do its heavyweight dance to read a coherent seqno to see if it is the > >>>lucky one. Alternatively, we can have one worker responsible for wakeing > >>>after an interrupt, checking the seqno and only wakeing up the clients > >>>who are complete. The disadvantage is that in the uncontended scenario > >>>(i.e. only one waiter) we incur an extra context switch in the wakeup > >>>path - though that should be mitigated somewhat by the busywait we do > >>>first before sleeping. > >> > >>Could you explain the design in a bit more detail in the commit message? > > > >One worker responsible for waking up after an interrupt and waking > >clients who are complete? > > I think more is required. At least explain the request tree, how it > is built and used, and the flow a bit. > > >>And add some more comments in the code, where key things are > >>happening, new struct members, etc? > > > >I did. > > Maybe in some patch I don't see? The posted has zero comments on new > members added to intel_engine_cs and incomplete comments for the > same in drm_i915_gem_request. /* Used for irq processing of waiting requests */ > And the the code throughout has really to few comments relative to > the complexity. Where's the complexity? If you tell me as a reader what is not obvious that helps. I have tried to make the code/variables reasonably descriptive, and my goal in comments is to explain why not what. > There is actually only one, about schedule() which is not really the > main point! :) But that tells me a lot about the why (why this and not something else). > Nothing about the tree handling, state variables like > irq_first and similar. But they were simple? It's just a sorted list (using the available rbtree implementation) with an optimisation to store the oldest seqno. There's very little "why?" I felt I needed to explain there. > >>>+ > >>>+ /* Unlike the individual clients, we do not want this > >>>+ * background thread to contribute to the system load, > >>>+ * i.e. we do not want to use io_schedule() here. > >>>+ */ > >>>+ schedule(); > >> > >>I am also thinking about whether busy spin makes sense more in wait > >>request or here. Hm.. With that optimisation which makes only the > >>waiter on the next request in the queue spin it is OK to do it > >>there. > > > >No. I don't think that makes sense. To get here we have demonstrated that > >we are in a long-wait sequence, and we already the code already has the > >inclination to prevent busy-spins when we are waiting. Now we do introduce > >yet another context switch between the irq and client, but most of the > >latency is introduced by setting up the irq in the first place. > > What doesn't make sense? To keep the busy spin, even optimized, in > wait request after this patch? Oh, I thought you were talking about adding a spin here, i.e. after we are woken up to complete a requests, busywait on the next one before sleeping. > >>(So please respin that patch series as well.) > > > >In review there was only a quick rename, but I also want to change it to > >a 10us timeout as that is favourable to more synchronous igt loads. > > I thought I found a reversal of logic in "drm/i915: Only spin whilst > waiting on the current request" which needs fixing? That was trivial, it was rebasing a patch from using this kworker to vanilla that introduced the error. > >>Worth thinking about a dedicated, maybe even CPU local work queue > >>for maximum responsiveness? > > > >Considered a kthread per ring. Downsides are that it consumes a slot for > >mostly idle threads, which are what the kworkers are for. (I am not keen > >on the complaints we would get for 6 kirq-i915/* threads.) > > Also, I like the worker since when there is no one waiting it does > not get woken up from notify_ring(). A kthread also would keep the irq enabled to a minimum. I've seen the misery of having the IRQ fire after breadcrumb and do not want it back. > So was just thinking, wasn't sure, if using the system wq makes it > compete with other jobs. If so dedicated worker wouldn't harm. Could > make it high priority as well. Not sure it would make sense to make > it bound any more. Probably the period for sleep to wakeup is too > long for locality effects. Right, this was why I though a kthread would be better as we wouldn't monopolise a kworker thread by sleeping for long periods of time. Not sure what the answer is, but with a more complicated single kirq-i915 design perhaps the tradeoff is towards the kthread. > P.S. And just realised this work is competing with the scheduler > which changes all this again. On the other hand, there are regressions to be solved before more features. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx