Quoting Francisco Jerez (2018-07-28 06:20:12) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > A recent trend for cpufreq is to boost the CPU frequencies for > > iowaiters, in particularly to benefit high frequency I/O. We do the same > > and boost the GPU clocks to try and minimise time spent waiting for the > > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU > > frequency will result in the GPU being throttled and its frequency being > > reduced. Thus declaring iowait negatively impacts on GPU throughput. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 > > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") > > This patch causes up to ~13% performance regressions (with significance > 5%) on several latency-sensitive tests on my BXT: > > jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128: XXX ±35.69% x53 -> XXX ±32.57% x61 d=-13.52% ±31.88% p=2.58% > jxrendermark/rendering-test=Transformed Blit Bilinear/rendering-size=128x128: XXX ±3.51% x21 -> XXX ±3.77% x21 d=-12.08% ±3.41% p=0.00% > gtkperf/gtk-test=GtkComboBox: XXX ±1.90% x19 -> XXX ±1.59% x20 d=-4.74% ±1.71% p=0.00% > x11perf/test=500px Compositing From Pixmap To Window: XXX ±2.35% x21 -> XXX ±1.73% x21 d=-2.69% ±2.04% p=0.01% > qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.38% x21 -> XXX ±0.40% x25 d=-2.20% ±0.38% p=0.00% > x11perf/test=500px Compositing From Pixmap To Window: XXX ±2.78% x53 -> XXX ±2.27% x61 d=-1.77% ±2.50% p=0.03% > > It's unsurprising to see latency-sensitive workloads relying on the > lower latency offered by io_schedule_timeout(), since the CPUFREQ > governor will have substantial downward bias without it, in response to > the intermittent CPU usage pattern of those benchmarks. Fwiw, I have a better example, gem_sync --run store-default. This test waits on a short batch, with io_schedule_timeout: Completed 987136 cycles: 152.092 us with schedule_timeout: Completed 157696 cycles: 956.403 us Though note that for a no-op batch, we see no difference as the sleep is short enough, both take on average 52us. But microbenchmarks be micro. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx