On Wed, Aug 03, 2016 at 04:36:46PM +0100, Dave Gordon wrote: > The parallel execution test in gem_exec_nop chooses a pessimal > distribution of work to multiple engines; specifically, it > round-robins one batch to each engine in turn. As the workloads > are trivial (NOPs), this results in each engine becoming idle > between batches. Hence parallel submission is seen to take LONGER > than the same number of batches executed sequentially. > > If on the other hand we send enough work to each engine to keep > it busy until the next time we add to its queue, (i.e. round-robin > some larger number of batches to each engine in turn) then we can > get true parallel execution and should find that it is FASTER than > sequential execuion. > > By experiment, burst sizes of between 8 and 256 are sufficient to > keep multiple engines loaded, with the optimum (for this trivial > workload) being around 64. This is expected to be lower (possibly > as low as one) for more realistic (heavier) workloads. Quite funny. The driver submission overhead of A...A vs ABAB... engines is nearly identical, at least as far as the analysis presented here. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx