Re: [PATCH] drm/i915: Optimise VMA lookup slightly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 13/12/2016 13:41, Chris Wilson wrote:
On Tue, Dec 13, 2016 at 01:30:19PM +0000, Tvrtko Ursulin wrote:

On 13/12/2016 12:41, Chris Wilson wrote:
On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

A few details to hopefully make a very hot function a tiny bit
more efficient:

1. Cast VM pointers before substraction to save the compiler
   doing a smart one which includes multiplication.

Indeed. Not pretty though.

static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
{
	return a - b;
}

cmp = ptrdiff(vma->vm, vm);
if (cmp)
	return cmp;

Pffifle as you would say. :)

2. Use smaller type for comparison since we only care about
   the sign.

Should be a no-op since the compiler also should only care about the
sign and not be moving the registers about, just the cc and we should be
inlining... Is gcc not smart enough? :(

I am pretty sure there was a significant difference in the size of
generated code so apparently it isn't.

If you have two available, can you attach the asm?

http://paste.debian.net/901995/ if you can figure it out.

Dropped it from v2 though.

3. Prefer the ppgtt lookup branch and inline it, allowing the
   compiler to optimise out the second part of i915_vma_compare
   and save one call indirection.

This runs counter to a better optimisation that completely avoids
calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
>from handle to vma, skipping the handle to obj intermediate lookup).

Primary caller for this function should be ggtt users, with single
negative lookups before creating the ppgtt vma.

Empirically ppgtt users call this much much more on any benchmarks
I've tried since with the change all I see is lookup_and_create in
the profiles and not obj_to_vma.

Exactly, hence why we should tackle the root of the problem... Replace a
slow idr lookup and rbtree walk with a direct lookup (hashtable) from
the fpriv. Since it doesn't just affect full-ppgtt but the introduction
of the second lookup regressed everyone,
https://bugs.freedesktop.org/show_bug.cgi?id=87131 and the myriad of
igt/benchmark/gem_exec_reloc benchmarks.

Dropped this from v2 as well, only kept ptrdiff.

Will have a look around for the things you mentioned here.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux