Re: [PATCH 2/2] drm/i915/guc: default to using GuC submission where possible

Dave Gordon <david.s.gordon@xxxxxxxxx> · Wed, 27 Apr 2016 18:53:25 +0100

On 26/04/16 15:00, Daniel Vetter wrote:
On Mon, Apr 25, 2016 at 09:29:42AM +0100, Chris Wilson wrote:
On Mon, Apr 25, 2016 at 08:31:07AM +0100, Dave Gordon wrote:
On 22/04/16 19:51, Chris Wilson wrote:
On Fri, Apr 22, 2016 at 07:45:15PM +0100, Chris Wilson wrote:
On Fri, Apr 22, 2016 at 07:22:55PM +0100, Dave Gordon wrote:
This patch simply changes the default value of "enable_guc_submission"
>from 0 (never) to -1 (auto). This means that GuC submission will be
used if the platform has a GuC, the GuC supports the request submission
protocol, and any required GuC firmwware was successfully loaded. If any
of these conditions are not met, the driver will fall back to using
execlist mode.

I just remembered something else.

  * Work Items:
  * There are several types of work items that the host may place into a
  * workqueue, each with its own requirements and limitations. Currently only
  * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
  * represents in-order queue. The kernel driver packs ring tail pointer and an
  * ELSP context descriptor dword into Work Item.

Is this right? You only allocate a single client covering all engines and
specify INORDER. We expect parallel execution between engines, is this
supported? Empirically it seems like guc is only executing commands in
series across engines and not in parallel.
-Chris

AFAIK, INORDER represents in-order executions of elements in the
GuC's (internal) submission queue, which is per-engine; i.e. this
option bypasses the GuC's internal scheduling algorithms and makes
the GuC behave as a simple dispatcher. It demultiplexes work queue
items into the multiple submission queues, then executes them in
order from there.

Alex can probably confirm this in the GuC code, but I really think
we'd have noticed if execution were serialised across engines. For a
start, the validation tests that have one engine busy-spin while
waiting for a batch on a different engine to update a buffer
wouldn't ever finish.

That doesn't seem to be the issue, we can run in parallel it seems
(busy-spin on one engine doesn't prevent a write on the second). It's
just the latency it seems. Overall the execution latency goes up
substantially with guc, and in this case it does not seem to be executing
the second execbuf on the second ring until after the first completes.

That sounds like a decent bug in guc code, and defeats the point of all
the work to speed up execlist submission going on right now.

Can we have non-slow guc somehow? Do we need to escalate this to the
firmware folks and first make sure they have a firmware released which
doesn't like to twiddle thumsb (assuming it's a guc issue indeed and not
in how we submit things)?

According to the numbers I was getting yesterday, GuC submission is now 
slightly faster than execlists on the render engine (because execlists 
is slower on that engine), but still a bit slower on the others. See

http://www.spinics.net/lists/intel-gfx/msg94140.html

Afaiui the point of guc was to reduce submission latency by again having a
queue to submit to, instead of the 1.5 submit ports with execlist. There's
other reasons on top, but if firmware engineers butchered that it doesn't
look good.
-Daniel

I don't think it was ever about latency. I think the GuC was added to 
reduce the overhead of fielding context-switch interrupts on the CPU.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx