The parallel execution test in gem_exec_nop chooses a pessimal distribution of work to multiple engines; specifically, it round-robins one batch to each engine in turn. As the workloads are trivial (NOPs), this results in each engine becoming idle between batches. Hence parallel submission is seen to take LONGER than the same number of batches executed sequentially. If on the other hand we send enough work to each engine to keep it busy until the next time we add to its queue, (i.e. round-robin some larger number of batches to each engine in turn) then we can get true parallel execution and should find that it is FASTER than sequential execuion. By experiment, burst sizes of between 8 and 256 are sufficient to keep multiple engines loaded, with the optimum (for this trivial workload) being around 64. This is expected to be lower (possibly as low as one) for more realistic (heavier) workloads. Signed-off-by: Dave Gordon <david.s.gordon@xxxxxxxxx> --- tests/gem_exec_nop.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c index 9b89260..c2bd472 100644 --- a/tests/gem_exec_nop.c +++ b/tests/gem_exec_nop.c @@ -166,14 +166,17 @@ static void all(int fd, uint32_t handle, int timeout) gem_sync(fd, handle); intel_detect_and_clear_missed_interrupts(fd); +#define BURST 64 + count = 0; clock_gettime(CLOCK_MONOTONIC, &start); do { - for (int loop = 0; loop < 1024; loop++) { + for (int loop = 0; loop < 1024/BURST; loop++) { for (int n = 0; n < nengine; n++) { execbuf.flags &= ~ENGINE_FLAGS; execbuf.flags |= engines[n]; - gem_execbuf(fd, &execbuf); + for (int b = 0; b < BURST; ++b) + gem_execbuf(fd, &execbuf); } } count += nengine * 1024; -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx