Re: [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/08/16 16:27, Dave Gordon wrote:

[snip]

Note that SKL GuC firmware 6.1 didn't support dual submission or lite
restore, whereas the next version (8.11) does. Therefore, with that
firmware we don't see the same slowdown when going to 1-at-a-time
round-robin. I have a different (new) test that shows this more clearly.

This is with GuC version 6.1:

skylake# ./intel-gpu-tools/tests/gem_exec_paranop | fgrep -v SUCCESS

Time to exec 8-byte batch:	  3.428µs (ring=render)
Time to exec 8-byte batch:	  2.444µs (ring=bsd)
Time to exec 8-byte batch:	  2.394µs (ring=blt)
Time to exec 8-byte batch:	  2.615µs (ring=vebox)
Time to exec 8-byte batch:	  2.625µs (ring=all, sequential)
Time to exec 8-byte batch:	 12.701µs (ring=all, parallel/1) ***
Time to exec 8-byte batch:	  7.259µs (ring=all, parallel/2)
Time to exec 8-byte batch:	  4.336µs (ring=all, parallel/4)
Time to exec 8-byte batch:	  2.937µs (ring=all, parallel/8)
Time to exec 8-byte batch:	  2.661µs (ring=all, parallel/16)
Time to exec 8-byte batch:	  2.245µs (ring=all, parallel/32)
Time to exec 8-byte batch:	  1.626µs (ring=all, parallel/64)
Time to exec 8-byte batch:	  2.170µs (ring=all, parallel/128)
Time to exec 8-byte batch:	  1.804µs (ring=all, parallel/256)
Time to exec 8-byte batch:	  2.602µs (ring=all, parallel/512)
Time to exec 8-byte batch:	  2.602µs (ring=all, parallel/1024)
Time to exec 8-byte batch:	  2.607µs (ring=all, parallel/2048)

Time to exec 4Kbyte batch:	 14.835µs (ring=render)
Time to exec 4Kbyte batch:	 11.787µs (ring=bsd)
Time to exec 4Kbyte batch:	 11.533µs (ring=blt)
Time to exec 4Kbyte batch:	 11.991µs (ring=vebox)
Time to exec 4Kbyte batch:	 12.444µs (ring=all, sequential)
Time to exec 4Kbyte batch:	 16.211µs (ring=all, parallel/1)
Time to exec 4Kbyte batch:	 13.943µs (ring=all, parallel/2)
Time to exec 4Kbyte batch:	 13.878µs (ring=all, parallel/4)
Time to exec 4Kbyte batch:	 13.841µs (ring=all, parallel/8)
Time to exec 4Kbyte batch:	 14.188µs (ring=all, parallel/16)
Time to exec 4Kbyte batch:	 13.747µs (ring=all, parallel/32)
Time to exec 4Kbyte batch:	 13.734µs (ring=all, parallel/64)
Time to exec 4Kbyte batch:	 13.727µs (ring=all, parallel/128)
Time to exec 4Kbyte batch:	 13.947µs (ring=all, parallel/256)
Time to exec 4Kbyte batch:	 12.230µs (ring=all, parallel/512)
Time to exec 4Kbyte batch:	 12.147µs (ring=all, parallel/1024)
Time to exec 4Kbyte batch:	 12.617µs (ring=all, parallel/2048)

What this shows is that the submission overhead is ~3us which is comparable with the execution time of a trivial (8-byte) batch, but insignificant compared with the time to execute the 4Kbyte batch. The burst size therefore makes very little difference to the larger batches.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux