Quoting Tvrtko Ursulin (2017-09-21 16:12:21) > > On 21/09/2017 14:54, Chris Wilson wrote: > > Since we inherited the context image setup from gen8 which needed a > > per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer > > on gen9. Now that we can skip adding the buffer to the context image, > > remove the dangling per-bb. This slightly improves execution latency, > > most notably on an idle engine. > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=87725 > > How much of the 7% we get back? :) Not enough. The difference in execution latency between ringbuffer submission and execlists for this type of workload is roughly an order of magnitude (~5us to ~30us, using gem_sync as a reasonable proxy). The per-bb accounts for around 6us of that on bdw, so a big chunk but still a few times slower. Not that we do move the GPGPU workaround on bdw just yet, I left that for when we do play with preemption and MI_ARB_ON_OFF. (Side note, the remaining difference between ringbuffer and execlists seems to be related to MI arbitration...) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx