Re: [PATCH] drm/i915: Optimise VMA lookup slightly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 13/12/2016 12:41, Chris Wilson wrote:
On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

A few details to hopefully make a very hot function a tiny bit
more efficient:

 1. Cast VM pointers before substraction to save the compiler
    doing a smart one which includes multiplication.

Indeed. Not pretty though.

static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b)
{
	return a - b;
}

cmp = ptrdiff(vma->vm, vm);
if (cmp)
	return cmp;

Pffifle as you would say. :)

 2. Use smaller type for comparison since we only care about
    the sign.

Should be a no-op since the compiler also should only care about the
sign and not be moving the registers about, just the cc and we should be
inlining... Is gcc not smart enough? :(

I am pretty sure there was a significant difference in the size of generated code so apparently it isn't.

 3. Prefer the ppgtt lookup branch and inline it, allowing the
    compiler to optimise out the second part of i915_vma_compare
    and save one call indirection.

This runs counter to a better optimisation that completely avoids
calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight
from handle to vma, skipping the handle to obj intermediate lookup).

Primary caller for this function should be ggtt users, with single
negative lookups before creating the ppgtt vma.

Empirically ppgtt users call this much much more on any benchmarks I've tried since with the change all I see is lookup_and_create in the profiles and not obj_to_vma.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux