On 18/08/16 16:36, Dave Gordon wrote:
On 18/08/16 16:27, Dave Gordon wrote:
[snip]
Note that SKL GuC firmware 6.1 didn't support dual submission or lite
restore, whereas the next version (8.11) does. Therefore, with that
firmware we don't see the same slowdown when going to 1-at-a-time
round-robin. I have a different (new) test that shows this more clearly.
This is with GuC version 6.1:
skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
Time to exec 8-byte batch: 3.428µs (ring=render)
Time to exec 8-byte batch: 2.444µs (ring=bsd)
Time to exec 8-byte batch: 2.394µs (ring=blt)
Time to exec 8-byte batch: 2.615µs (ring=vebox)
Time to exec 8-byte batch: 2.625µs (ring=all, sequential)
Time to exec 8-byte batch: 12.701µs (ring=all, parallel/1) ***
Time to exec 8-byte batch: 7.259µs (ring=all, parallel/2)
Time to exec 8-byte batch: 4.336µs (ring=all, parallel/4)
Time to exec 8-byte batch: 2.937µs (ring=all, parallel/8)
Time to exec 8-byte batch: 2.661µs (ring=all, parallel/16)
Time to exec 8-byte batch: 2.245µs (ring=all, parallel/32)
Time to exec 8-byte batch: 1.626µs (ring=all, parallel/64)
Time to exec 8-byte batch: 2.170µs (ring=all, parallel/128)
Time to exec 8-byte batch: 1.804µs (ring=all, parallel/256)
Time to exec 8-byte batch: 2.602µs (ring=all, parallel/512)
Time to exec 8-byte batch: 2.602µs (ring=all, parallel/1024)
Time to exec 8-byte batch: 2.607µs (ring=all, parallel/2048)
And for comparison, here are the figures with v8.11:
# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS
Time to exec 8-byte batch: 3.458µs (ring=render)
Time to exec 8-byte batch: 2.154µs (ring=bsd)
Time to exec 8-byte batch: 2.156µs (ring=blt)
Time to exec 8-byte batch: 2.156µs (ring=vebox)
Time to exec 8-byte batch: 2.388µs (ring=all, sequential)
Time to exec 8-byte batch: 5.897µs (ring=all, parallel/1)
Time to exec 8-byte batch: 4.669µs (ring=all, parallel/2)
Time to exec 8-byte batch: 4.278µs (ring=all, parallel/4)
Time to exec 8-byte batch: 2.410µs (ring=all, parallel/8)
Time to exec 8-byte batch: 2.165µs (ring=all, parallel/16)
Time to exec 8-byte batch: 2.158µs (ring=all, parallel/32)
Time to exec 8-byte batch: 1.594µs (ring=all, parallel/64)
Time to exec 8-byte batch: 1.583µs (ring=all, parallel/128)
Time to exec 8-byte batch: 2.473µs (ring=all, parallel/256)
Time to exec 8-byte batch: 2.264µs (ring=all, parallel/512)
Time to exec 8-byte batch: 2.357µs (ring=all, parallel/1024)
Time to exec 8-byte batch: 2.382µs (ring=all, parallel/2048)
All generally slightly faster, but parallel/1 is approximately twice as
fast, while parallel/64 is virtually unchanged, as are all the timings
for large batches.
.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx