Re: [PATCH] drm/i915: Make wa_tail_dwords flexible for future platforms.

Dave Gordon <david.s.gordon@xxxxxxxxx> · Wed, 27 Jan 2016 12:27:16 +0000

On 26/01/16 14:06, Chris Wilson wrote:
On Tue, Jan 26, 2016 at 01:51:19PM +0000, Rodrigo Vivi wrote:
    On Tue, Jan 26, 2016 at 12:30 AM Chris Wilson
    <[1]chris@xxxxxxxxxxxxxxxxxx> wrote:

      On Mon, Jan 25, 2016 at 09:17:15PM +0000, Chris Wilson wrote:
      > On Mon, Jan 25, 2016 at 11:29:19AM -0800, Rodrigo Vivi wrote:
      > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
      > > @@ -764,18 +764,18 @@ intel_logical_ring_advance_and_submit(struct
      drm_i915_gem_request *request)
      > >  {
      > >     struct intel_ringbuffer *ringbuf = request->ringbuf;
      > >     struct drm_i915_private *dev_priv = request->i915;
      > > +   int i;
      > >
      > >     intel_logical_ring_advance(ringbuf);
      > >     request->tail = ringbuf->tail;
      > >
      > >     /*
      > > -    * Here we add two extra NOOPs as padding to avoid
      > > +    * Here we add extra NOOPs as padding to avoid
      > >      * lite restore of a context with HEAD==TAIL.
      > > -    *
      > > -    * Caller must reserve WA_TAIL_DWORDS for us!
      > >      */
      > > -   intel_logical_ring_emit(ringbuf, MI_NOOP);
      > > -   intel_logical_ring_emit(ringbuf, MI_NOOP);
      > > +   for (i = 0; i < ringbuf->wa_tail_dwords; i++)
      > > +           intel_logical_ring_emit(ringbuf, MI_NOOP);
      > > +
      > >     intel_logical_ring_advance(ringbuf);
      > >
      > >     if (intel_ring_stopped(request->ring))
      > > @@ -876,6 +876,16 @@ int intel_logical_ring_begin(struct
      drm_i915_gem_request *req, int num_dwords)
      > >     if (ret)
      > >             return ret;
      > >
      > > +   if (IS_GEN8(req->ring->dev) || IS_GEN9(req->ring->dev))
      >
      > req->i915
      >
      > This is attrocious. Just allocate the extra space when required.

    by this logic I should just emit the mi_noops when required as well,
    right?

Yes, I didn't like the placement of the wa_tail but I went with that to
avoid the code duplication.

      Slightly less grumpy this morning.

    thanks

      1. This is duplicating the reserved-space mechanism, by open-coding the
      requirements for execlists. Fine-tuning the reserved space per ring may
      be worth it, but probably not. Over reserving space is not a hung issue
      (it just effectively reduces the size of the ring), and the granularity
      is the size of the average request.

    forgive this clueless mind here, but I don't see how I'm duplicating the
    reserved-space...

You are extending every begin by the overallocation required to emit
the tail dwords. We already extend every begin by the overallocation
required to emit the request (until we come to emit the request, where
there is no more overallocation applied).

      2. You are hiding how much space is actually used during request
      emission. This makes review impossible, and we depend upon review to
      verify that the intel_ring_begin() matches the number of dwords emitted.

    but the mi_noops are hidden on the submit and advance... shouldn't we move
    it back to the places that allocates it.

Hence why I stressed that in the comments - but it is a tail call, just
read it as one function. The important sequence is that

intel_ring_begin(count)
...
count x intel_ring_emit
...
intel_ring_advance()

is clear to the reader. Yes, this breaks that rule by replacing
intel_ring_advance() with a custom lr_ring_advance_and_submit() and
perhaps it would be clearer to add lr_ring_begin_for_submit() or
something to stress the slight discrepancy, but still make the pairing
clear.

      3. Is this even the right mechanism considering the number of other ways
      of automatically emitting instructions between batches and contexts? We
      cannot answer that as this patch is out of context.

    yeap, sorry again, I was just going to the easiest path to be able to
    avoid the nulls per platform without adding 3 ifs..
    But I wonder if you mean on comment "1." that we can live with
    WA_TAIL_DWORDS 2 and avoid only the NULLs when needed... Is this the case?

If you want more dwords in the add_request callback, we need to add
those to the MIN_SPACE_FOR_ADD_REQUEST. If we need to add a lot, then
making it variable seems fine - but it should just hook into the common
mechanism i.e. the minimum space should be computed during engine
initialisation and the reservation applied at i915_gem_eequest_alloc().
-Chris

I think the cleanest partitioning of the functionality would be:
    1. The space for the NOOPs should be accounted for in the reserved
       space, because it's just part of the total space required to
       complete an add_request/emit_request(). Since the amount
       reserved is determined in intel_{logical_}ring_reserve_space()
       it could be added only in the LRC path, if we were concerned
       about the extra space (which I don't think we should be).

    2. callers do begin(N), N*emit(), advance(), add_request(). They
       don't bother about extra NOOPs.

    3. gen8_emit_request() shouldn't have to bother with them either, or
       even with claiming the space for them.

    4. advance_and_submit() (which is execlist specific) can do an extra
       begin() just to keep begin/advance balanced -- it can't fail or
       wait, 'cos it's in the reserved space -- and emits the extra
       NOOPs. This is where it can be made conditional on specific GENs,
       if you want that to be explicit, though since the overhead is so
       small I'd be inclined to always enable it here, and only check
       whether to actually apply the TAIL-bump in the ELSP-poking code.

In summary: mostly as Chris had it, but without the extra space being 
added to the begin() call in gen8_emit_request() (as Rodrigo has it).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx