Re: Making Fedora faster (was Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal))

Gordon Messmer <gordon.messmer@xxxxxxxxx> · Sat, 9 Jul 2022 13:32:25 -0700

On 6/21/22 21:40, Gordon Messmer wrote:
On 6/21/22 13:10, Matthew Miller wrote:
Phoronix credits this to those distros shipping with P-state 
Performance by default.

Yes, but I doubt that for several reasons: First, it's a claim without 
evidence.  That setting isn't the only difference between any two 
systems tested.  Second, the claim doesn't make any *sense*.  Systems 
with intel_pstate balanced aren't supposed to be noticeably slower for 
sustained CPU intensive workloads.  The intel_pstate driver is supposed 
to scale the frequency up under load in the "balanced" configuration, 
delivering performance when it is needed and power saving when it 
isn't.  Third, I can run their tests on my own system in an intel_pstate 
performance mode and an intel_pstate balanced mode, and the test results 
are nearly identical, which is the expected outcome.

I did some work this week to see if I could learn anything from 
Phoronix's article [1], and came up pretty much dry.  I cannot replicate 
any of the differences that I would expect to be able to.  More than 
anything else, their results look like evidence of a bug in the Xeon 
Platinum 8380.

In retrospect, the first thing that should have stood out to me when I 
looked at this ~ 3 weeks ago (but which I missed) was that if I pull the 
phoronix/pts container image and "run pts/compress-zstd" with 
"Compression Level: 3, Long Mode", I get better results on my XPS 13 
laptop than they did on their Xeon.  And, while cpubenchmark.net does 
suggest that my i5 CPU [2] has a better single-core test results than 
the Xeon [3], the zstd test should not be limited to a single core.  On 
my laptop, top reported the zstd process typically using ~400% CPU time.

The first thing I tried to reproduce was a difference between 
"performance" and "powersave" settings in the intel_pstate cpufreq 
driver.  I used the zstd compression test on my only Intel CPU, which is 
in my XPS 13 laptop.  In the default Fedora WS configuration, 
scaling_driver is intel_pstate, scaling_governor is powersave, and (I 
believe) energy_performance_preference is balance_performance.  In that 
configuration, typical values for scaling_cur_freq were significantly 
lower than typical values after changing energy_performance_preference 
to performance, and scaling_governor to performance.  So on this laptop, 
I'm confident that the governor and EPP settings are behaving as 
expected.  But zstd benchmark results are essentially indistinguishable 
when running in one mode vs the other, because the powersave mode for 
the intel_pstate driver will scale CPU speed up on demand.

In addition, phoronix has benchmarked Intel systems in the past [4] to 
determine the effective difference between the intel_pstate powersave 
and performance modes, and found minimal differences on an Intel i9 CPU.

I also tested the svt-av1 benchmark on this system in both modes, as 
this was another CPU bound test that Phoronix reported as a significant 
difference and attributed to the P-State governor setting.  Again, I saw 
no significant difference between performance and powersave results.

All of this suggests that the Xeon was simply not scaling up for these 
tests.  Given its large number of cores, perhaps the benchmarks weren't 
putting enough load on the system to trigger scaling up.  Or (as a 
matter of *wild* speculation) maybe it was scaling up some cores, but 
Linux was shuffling tasks between cores and "missing" the fast ones. 
Whatever the case, the big differences between distributions reported by 
Phoronix are probably limited to this class of CPUs.

If this is *normal* behavior for those CPUs, then maybe the Fedora 
Server group would want to change the default governor, or emphasize the 
importance of the CPU governor selection in their documentation.

I also ran benchmarks on CentOS Stream 9 and Fedora Server 36, each 
installed in a VM under CentOS Stream 9 libvirt, running on a host with 
a AMD Ryzen 5 CPU [5], with the CPU configuration copied to the guests. 
 As VMs, these would not apply any cpufreq management of their own, and 
if there were any differences resulting from the CPU architecture 
target, they should be apparent in these tests.  Test results for these 
VMs 20-40% better than the Xeon's best results, but results under the 
CentOS Stream 9 VM were essentially the same as results under the Fedora 
Server 36 VM.  It's probably still interesting to run the full suite and 
see if any other tests do have significant differences, and I'll try to 
do that later.

I think that's enough to convince me that I was wrong to doubt that the 
intel_pstate configuration was the reason that these results differed, 
although I still believe that if that is the case, then the CPU's 
internal pstate selection is broken.

1: https://www.phoronix.com/scan.php?page=article&item=h1-2022-linux&num=1

2: 
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-1135G7+%40+2.40GHz&id=3830

3: 
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Platinum+8380+%40+2.30GHz&id=4483

4: 
https://www.phoronix.com/scan.php?page=article&item=linux50-pstate-cpufreq&num=1

5: https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+5+5600X&id=3859
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure