Re: [PATCH v2] drm/i915: Optimise VMA lookup slightly

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 15 Dec 2016 16:49:49 +0000

On 13/12/2016 14:47, Chris Wilson wrote:
On Tue, Dec 13, 2016 at 02:37:27PM +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Cast VM pointers before substraction to save the compiler
doing a smart one which includes multiplication.

v2: Only keep the first optimisation and prettify it. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Step 1, ok.
Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

(I wasn't against the others, just curious as to what gcc was doing for
#2 and #3 I'd like just to pursue a different path altogether :)

Thanks.

Yes I know. Longer VMA lists is not something I've tested yet. I've just 
noticed that even where lookups are predominantly on short lists it can 
still be up to 1% of CPU time spent in the lookup. It averages around 
0.7% AFAIR.

More precisely in that test (which is simply running a vsync limited 
neverball intro screen :)), 65% of all lookups are on single VMA object! 
29% on objects with two VMAs and 29% on on objects with three VMAs. 
That's it, no longer lists at all.

How much benefit for this case smarter lookup would make I was not sure. 
So simply wanted to tighten up the existing search as much as possible. 
Even for that I am not sure that it makes a difference but at least if 
we can pointless instructions why not.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx