Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Quoting Francisco Jerez (2018-07-29 20:29:42) >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> > Quoting Francisco Jerez (2018-07-28 21:18:50) >> >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> >> >> > Quoting Francisco Jerez (2018-07-28 06:20:12) >> >> >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> >> >> >> >> > A recent trend for cpufreq is to boost the CPU frequencies for >> >> >> > iowaiters, in particularly to benefit high frequency I/O. We do the same >> >> >> > and boost the GPU clocks to try and minimise time spent waiting for the >> >> >> > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU >> >> >> > frequency will result in the GPU being throttled and its frequency being >> >> >> > reduced. Thus declaring iowait negatively impacts on GPU throughput. >> >> >> > >> >> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 >> >> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") >> >> >> >> >> >> This patch causes up to ~13% performance regressions (with significance >> >> >> 5%) on several latency-sensitive tests on my BXT: >> >> >> >> >> >> jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128: XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58% >> >> > >> >> >> >> The jxrendermark Linear Gradient Blend test-case had probably the >> >> smallest effect size of all the regressions I noticed... Can you take a >> >> look at any of the other ones instead? >> > >> > It was the biggest in the list, was it not? I didn't observe anything of >> > note in a quick look at x11perf, but didn't let it run for a good sample >> > size. They didn't seem to be as relevant as jxrendermark so I went and >> > dug that out. >> > >> >> That was the biggest regression in absolute value, but the smallest in >> effect size (roughly 0.4 standard deviations). > > d=-13.52% wasn't the delta between the two runs? > It is, less than half of 31.88% which is the pooled standard deviation. > Sorry, but it appears to be redacted beyond my comprehension. > >> >> > Curious, as this is just a bunch of composites and as with the others, >> >> > should never be latency sensitive (at least under bare X11). >> >> >> >> They are largely latency-sensitive due to the poor pipelining they seem >> >> to achieve between their GPU rendering work and the X11 thread. >> > >> > Only the X11 thread is touching the GPU, and in the cases I looked at >> > it, we were either waiting for the ring to drain or on throttling. >> > Synchronisation with the GPU was only for draining the queue on timing, >> > and the cpu was able to stay ahead during the benchmark. >> > >> >> Apparently the CPU doesn't get ahead enough for the GPU to be >> consistently loaded, which prevents us from hiding the latency of the >> CPU computation even in those cases. > > The curse of reproducibility. On my bxt, I don't see the issue, so we > have a significant difference in setup. > -Chris
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx