Re: [PATCH 13/15] drm/i915: Integrate GuC-based command submission

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/06/15 10:22, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@xxxxxxxxx>
>>
>> GuC-based submission is mostly the same as execlist mode, up to
>> intel_logical_ring_advance_and_submit(), where the context being
>> dispatched would be added to the execlist queue; at this point
>> we submit the context to the GuC backend instead.
>>
>> There are, however, a few other changes also required, notably:
>> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
>>
>> 2.  The GuC's TLB must be invalidated after a context is pinned at
>>     a new GGTT address.
>>
>> 3.  GuC firmware uses the one page before Ring Context as shared data.
>>     Therefore, whenever driver wants to get base address of LRC, we
>>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>>     number of LRCA.
>>
>> 4.  In the work queue used to pass requests to the GuC, the GuC
>>     firmware requires the ring-tail-offset to be represented as an
>>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>>     size must be reduced to the representable range (4 pages).
> 
> I don't like how this sabotages the existing execlists implementation
> in order for i915_guc_submission (an interesting choice of file name,
> since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
> i915_guc_submission (HW), not fitting into our, admittedly loose, naming
> convention very well) to share a few functions. Even a couple of which
> are already vfunc.
> -Chris

Not really "sabotages"; big ringbuffers are a waste of space anyway.
Four pages is enough to have at least 64 batchbuffers queued up for the
engine to process (1 second of video, or 0.00012 of a bitcoin). When the
scheduler lands, it will generally reduce the number of batches in the
h/w rings anyway, mostly to improve responsiveness and fair-sharing
among different applications.

I quite agree that the execlists implementation, which is mostly in a
file called intel_lrc.c, should really be split into LRC-related code
(common to execlists and GuC modes) with the rest moved into
intel_execlist_submission.c. Or is that i915_execlist_submission.c?

If we took contexts as primary, rather than "rings" (engines) we could
also share a lot more between GuC/execlists and legacy ringbuffer mode.
At least we should get some improvement with AntiOLR, so we'll have

    request->context<->ringbuffer->engine->submission mechanism :)

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux