Quoting Francisco Jerez (2018-07-28 21:18:50) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > Quoting Francisco Jerez (2018-07-28 06:20:12) > >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > >> > >> > A recent trend for cpufreq is to boost the CPU frequencies for > >> > iowaiters, in particularly to benefit high frequency I/O. We do the same > >> > and boost the GPU clocks to try and minimise time spent waiting for the > >> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU > >> > frequency will result in the GPU being throttled and its frequency being > >> > reduced. Thus declaring iowait negatively impacts on GPU throughput. > >> > > >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 > >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") > >> > >> This patch causes up to ~13% performance regressions (with significance > >> 5%) on several latency-sensitive tests on my BXT: > >> > >> jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128: XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58% > > > > The jxrendermark Linear Gradient Blend test-case had probably the > smallest effect size of all the regressions I noticed... Can you take a > look at any of the other ones instead? It was the biggest in the list, was it not? I didn't observe anything of note in a quick look at x11perf, but didn't let it run for a good sample size. They didn't seem to be as relevant as jxrendermark so I went and dug that out. > > Curious, as this is just a bunch of composites and as with the others, > > should never be latency sensitive (at least under bare X11). > > They are largely latency-sensitive due to the poor pipelining they seem > to achieve between their GPU rendering work and the X11 thread. Only the X11 thread is touching the GPU, and in the cases I looked at it, we were either waiting for the ring to drain or on throttling. Synchronisation with the GPU was only for draining the queue on timing, and the cpu was able to stay ahead during the benchmark. Off the top of my head, for X to be latency sensitive you need to mix client and Xserver rendering, along the lines of Paint; GetImage, in the extreme becoming gem_sync. Adding a compositor is also interesting for the context switching will prevent us merging requests (but that all depends on the frequency of compositor updates ofc), and we would need more CPU and require reasonably low latency (less than the next request) to keep the GPU busy. However, that is driven directly off interrupts, iowait isn't a factor -- but your hook could still be useful to provide pm_qos. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx