On 13/12/2016 14:47, Chris Wilson wrote:
On Tue, Dec 13, 2016 at 02:37:27PM +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cast VM pointers before substraction to save the compiler
doing a smart one which includes multiplication.
v2: Only keep the first optimisation and prettify it. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Step 1, ok.
Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
(I wasn't against the others, just curious as to what gcc was doing for
#2 and #3 I'd like just to pursue a different path altogether :)
Thanks.
Yes I know. Longer VMA lists is not something I've tested yet. I've just
noticed that even where lookups are predominantly on short lists it can
still be up to 1% of CPU time spent in the lookup. It averages around
0.7% AFAIR.
More precisely in that test (which is simply running a vsync limited
neverball intro screen :)), 65% of all lookups are on single VMA object!
29% on objects with two VMAs and 29% on on objects with three VMAs.
That's it, no longer lists at all.
How much benefit for this case smarter lookup would make I was not sure.
So simply wanted to tighten up the existing search as much as possible.
Even for that I am not sure that it makes a difference but at least if
we can pointless instructions why not.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx