On Tue, Dec 13, 2016 at 01:30:19PM +0000, Tvrtko Ursulin wrote: > > On 13/12/2016 12:41, Chris Wilson wrote: > >On Tue, Dec 13, 2016 at 12:22:18PM +0000, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >>A few details to hopefully make a very hot function a tiny bit > >>more efficient: > >> > >> 1. Cast VM pointers before substraction to save the compiler > >> doing a smart one which includes multiplication. > > > >Indeed. Not pretty though. > > > >static always_inline __kernel_ptrdiff_t ptrdiff(const void *a, const void *b) > >{ > > return a - b; > >} > > > >cmp = ptrdiff(vma->vm, vm); > >if (cmp) > > return cmp; > > Pffifle as you would say. :) > > >> 2. Use smaller type for comparison since we only care about > >> the sign. > > > >Should be a no-op since the compiler also should only care about the > >sign and not be moving the registers about, just the cc and we should be > >inlining... Is gcc not smart enough? :( > > I am pretty sure there was a significant difference in the size of > generated code so apparently it isn't. If you have two available, can you attach the asm? > >> 3. Prefer the ppgtt lookup branch and inline it, allowing the > >> compiler to optimise out the second part of i915_vma_compare > >> and save one call indirection. > > > >This runs counter to a better optimisation that completely avoids > >calling obj_to_vma for ppgtt lookups (i.e. in execbuffer we go straight > >from handle to vma, skipping the handle to obj intermediate lookup). > > > >Primary caller for this function should be ggtt users, with single > >negative lookups before creating the ppgtt vma. > > Empirically ppgtt users call this much much more on any benchmarks > I've tried since with the change all I see is lookup_and_create in > the profiles and not obj_to_vma. Exactly, hence why we should tackle the root of the problem... Replace a slow idr lookup and rbtree walk with a direct lookup (hashtable) from the fpriv. Since it doesn't just affect full-ppgtt but the introduction of the second lookup regressed everyone, https://bugs.freedesktop.org/show_bug.cgi?id=87131 and the myriad of igt/benchmark/gem_exec_reloc benchmarks. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx