On 12/02/16 12:00, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Assorted changes most likely without any practical effect
apart from a tiny reduction in generated code for the interrupt
handler and request submission.
* Remove needless initialization.
* Improve cache locality by reorganizing code and/or using
branch hints to keep unexpected or error conditions out
of line.
* Favor busy submit path vs. empty queue.
* Less branching in hot-paths.
v2:
* Avoid mmio reads when possible. (Chris Wilson)
* Use natural integer size for csb indices.
* Remove useless return value from execlists_update_context.
* Extract 32-bit ppgtt PDPs update so it is out of line and
shared with two callers.
* Grab forcewake across all mmio operations to ease the
load on uncore lock and use chepear mmio ops.
Version 2 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.
Is gem_latency an interesting benchmark for this?
Five runs on vanilla:
747693/1: 9.080us 2.000us 2.000us 121.840us
742108/1: 9.060us 2.520us 2.520us 122.645us
744097/1: 9.060us 2.000us 2.000us 122.372us
744056/1: 9.180us 1.980us 1.980us 122.394us
742610/1: 9.040us 2.560us 2.560us 122.525us
Five runs with this patch series:
786532/1: 10.760us 1.520us 1.520us 115.705us
780735/1: 10.740us 1.580us 1.580us 116.558us
783706/1: 10.800us 1.460us 1.460us 116.280us
784135/1: 10.800us 1.520us 1.520us 116.151us
784037/1: 10.740us 1.520us 1.520us 116.250us
So it looks all got better apart from dispatch latency.
5% more throughput, 30% better consumer and producer latencies, 5% less
CPU usage, but 18% worse dispatch latency.
Comments?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx