Re: [RFC] drm/i915: Emit to ringbuffer directly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 08/09/16 17:40, Chris Wilson wrote:
On Thu, Sep 08, 2016 at 04:12:55PM +0100, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.

I have the same patch! But I called it out, for historical reasons.

Yes I know we talked about it in the past but I did not think you will find time to actually write it amongst all the other things.

Oh, except mine uses out[0]...out[N] because gcc prefers that over
*out++ = ...

It copes just fine with the latter here, for example:

	*rbuf++ = cmd;
	*rbuf++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
	*rbuf++ = 0; /* upper addr */
	*rbuf++ = 0; /* value */

Is:

     3e9:       89 10                   mov    %edx,(%rax)
     3eb:       c7 40 04 04 01 00 00    movl   $0x104,0x4(%rax)
     3f2:       c7 40 08 00 00 00 00    movl   $0x0,0x8(%rax)
     3f9:       c7 40 0c 00 00 00 00    movl   $0x0,0xc(%rax)

And for the record, before this patch, with intel_ring_emit:

     53a:       8b 53 3c                mov    0x3c(%rbx),%edx
     53d:       48 8b 4b 08             mov    0x8(%rbx),%rcx
     541:       89 04 11                mov    %eax,(%rcx,%rdx,1)
     544:       8b 43 3c                mov    0x3c(%rbx),%eax
     547:       48 8b 53 08             mov    0x8(%rbx),%rdx
     54b:       83 c0 04                add    $0x4,%eax
     54e:       89 43 3c                mov    %eax,0x3c(%rbx)
     551:       c7 04 02 04 01 00 00    movl   $0x104,(%rdx,%rax,1)
     558:       8b 43 3c                mov    0x3c(%rbx),%eax
     55b:       48 8b 53 08             mov    0x8(%rbx),%rdx
     55f:       83 c0 04                add    $0x4,%eax
     562:       89 43 3c                mov    %eax,0x3c(%rbx)
     565:       c7 04 02 00 00 00 00    movl   $0x0,(%rdx,%rax,1)
     56c:       8b 43 3c                mov    0x3c(%rbx),%eax
     56f:       48 8b 53 08             mov    0x8(%rbx),%rdx
     573:       83 c0 04                add    $0x4,%eax
     576:       89 43 3c                mov    %eax,0x3c(%rbx)
     579:       c7 04 02 00 00 00 00    movl   $0x0,(%rdx,%rax,1)

Yuck :) At least they are not function calls to iowrite any more. :)

intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.

It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.

Useless instruction removal amounts to approximately
2384 bytes of saved text on my build.

Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.

It does show up in perf.

Cool.

Patch is not fully polished, but it compiles and runs
on Gen9 at least.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem_context.c    |  62 ++--
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  27 +-
  drivers/gpu/drm/i915/i915_gem_gtt.c        |  57 ++--
  drivers/gpu/drm/i915/intel_display.c       | 113 ++++---
  drivers/gpu/drm/i915/intel_lrc.c           | 223 +++++++-------
  drivers/gpu/drm/i915/intel_mocs.c          |  43 +--
  drivers/gpu/drm/i915/intel_overlay.c       |  69 ++---
  drivers/gpu/drm/i915/intel_ringbuffer.c    | 480 +++++++++++++++--------------
  drivers/gpu/drm/i915/intel_ringbuffer.h    |  19 +-
  9 files changed, 555 insertions(+), 538 deletions(-)

Hmm, mine is bigger.

  drivers/gpu/drm/i915/i915_gem_context.c    |  85 ++--
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  37 +-
  drivers/gpu/drm/i915/i915_gem_gtt.c        |  62 +--
  drivers/gpu/drm/i915/i915_gem_request.c    | 135 ++++-
  drivers/gpu/drm/i915/i915_gem_request.h    |   2 +
  drivers/gpu/drm/i915/intel_display.c       | 133 +++--
  drivers/gpu/drm/i915/intel_lrc.c           | 188 ++++---
  drivers/gpu/drm/i915/intel_lrc.h           |   2 -
  drivers/gpu/drm/i915/intel_mocs.c          |  50 +-
  drivers/gpu/drm/i915/intel_overlay.c       |  77 ++-
  drivers/gpu/drm/i915/intel_ringbuffer.c    | 762 ++++++++++++-----------------
  drivers/gpu/drm/i915/intel_ringbuffer.h    |  36 +-
  12 files changed, 721 insertions(+), 848 deletions(-)

(this includes moving the intel_ring_begin to i915_gem_request)

plus an ealier

  drivers/gpu/drm/i915/i915_gem_request.c |  26 ++---
  drivers/gpu/drm/i915/intel_lrc.c        | 121 ++++++++---------------
  drivers/gpu/drm/i915/intel_ringbuffer.c | 168 +++++++++++---------------------
  drivers/gpu/drm/i915/intel_ringbuffer.h |  10 +-
  4 files changed, 112 insertions(+), 213 deletions(-)

since I wanted parts of it for emitting timelines.

Ok what do you want to do?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux