Re: [PATCH 1/2] igt/gem_exec_nop: add burst submission to parallel execution test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/08/2016 17:05, Dave Gordon wrote:
On 03/08/16 16:45, Chris Wilson wrote:
On Wed, Aug 03, 2016 at 04:36:46PM +0100, Dave Gordon wrote:
The parallel execution test in gem_exec_nop chooses a pessimal
distribution of work to multiple engines; specifically, it
round-robins one batch to each engine in turn. As the workloads
are trivial (NOPs), this results in each engine becoming idle
between batches. Hence parallel submission is seen to take LONGER
than the same number of batches executed sequentially.

If on the other hand we send enough work to each engine to keep
it busy until the next time we add to its queue, (i.e. round-robin
some larger number of batches to each engine in turn) then we can
get true parallel execution and should find that it is FASTER than
sequential execuion.

By experiment, burst sizes of between 8 and 256 are sufficient to
keep multiple engines loaded, with the optimum (for this trivial
workload) being around 64. This is expected to be lower (possibly
as low as one) for more realistic (heavier) workloads.

Quite funny. The driver submission overhead of A...A vs ABAB... engines
is nearly identical, at least as far as the analysis presented here.
-Chris

Correct; but because the workloads are so trivial, if we hand out jobs one at a time to each engine, the first will have finished the one batch it's been given before we get round to giving at a second one (even in execlist mode). If there are N engines, submitting a single batch takes S seconds, and the workload takes W seconds to execute, then if W < N*S the engine will be idle between batches. For example, if N is 4, W is 2us, and S is 1us, then the engine will be idle some 50% of the time.

This wouldn't be an issue for more realistic workloads, where W >> S.
It only looks problematic because of the trivial nature of the work.

Can you post the numbers that you get?

I seem to get massive variability on my BDW. The render ring always gives me around 2.9us/batch but the other rings sometimes give me region of 1.2us and sometimes 7-8us.



.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux