On Mon, Apr 25, 2016 at 09:29:42AM +0100, Chris Wilson wrote: > On Mon, Apr 25, 2016 at 08:31:07AM +0100, Dave Gordon wrote: > > On 22/04/16 19:51, Chris Wilson wrote: > > >On Fri, Apr 22, 2016 at 07:45:15PM +0100, Chris Wilson wrote: > > >>On Fri, Apr 22, 2016 at 07:22:55PM +0100, Dave Gordon wrote: > > >>>This patch simply changes the default value of "enable_guc_submission" > > >>>from 0 (never) to -1 (auto). This means that GuC submission will be > > >>>used if the platform has a GuC, the GuC supports the request submission > > >>>protocol, and any required GuC firmwware was successfully loaded. If any > > >>>of these conditions are not met, the driver will fall back to using > > >>>execlist mode. > > > > > >I just remembered something else. > > > > > > * Work Items: > > > * There are several types of work items that the host may place into a > > > * workqueue, each with its own requirements and limitations. Currently only > > > * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which > > > * represents in-order queue. The kernel driver packs ring tail pointer and an > > > * ELSP context descriptor dword into Work Item. > > > > > >Is this right? You only allocate a single client covering all engines and > > >specify INORDER. We expect parallel execution between engines, is this > > >supported? Empirically it seems like guc is only executing commands in > > >series across engines and not in parallel. > > >-Chris > > > > AFAIK, INORDER represents in-order executions of elements in the > > GuC's (internal) submission queue, which is per-engine; i.e. this > > option bypasses the GuC's internal scheduling algorithms and makes > > the GuC behave as a simple dispatcher. It demultiplexes work queue > > items into the multiple submission queues, then executes them in > > order from there. > > > > Alex can probably confirm this in the GuC code, but I really think > > we'd have noticed if execution were serialised across engines. For a > > start, the validation tests that have one engine busy-spin while > > waiting for a batch on a different engine to update a buffer > > wouldn't ever finish. > > That doesn't seem to be the issue, we can run in parallel it seems > (busy-spin on one engine doesn't prevent a write on the second). It's > just the latency it seems. Overall the execution latency goes up > substantially with guc, and in this case it does not seem to be executing > the second execbuf on the second ring until after the first completes. That sounds like a decent bug in guc code, and defeats the point of all the work to speed up execlist submission going on right now. Can we have non-slow guc somehow? Do we need to escalate this to the firmware folks and first make sure they have a firmware released which doesn't like to twiddle thumsb (assuming it's a guc issue indeed and not in how we submit things)? Afaiui the point of guc was to reduce submission latency by again having a queue to submit to, instead of the 1.5 submit ports with execlist. There's other reasons on top, but if firmware engineers butchered that it doesn't look good. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx