The following commit has been merged into the sched/core branch of tip: Commit-ID: 46609527577d1def0af29ca5b56cffeeea771ada Gitweb: https://git.kernel.org/tip/46609527577d1def0af29ca5b56cffeeea771ada Author: Giovanni Gherdovich <ggherdovich@xxxxxxx> AuthorDate: Thu, 12 Nov 2020 19:26:13 +01:00 Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx> CommitterDate: Thu, 03 Dec 2020 10:00:35 +01:00 x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC Frequency invariant accounting calculations need the ratio freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power allocation between cores: AMD EPYC CPUs implement "Core Performance Boost". Three candidates are considered to estimate this value: - maximum non-boost frequency - maximum boost frequency - the mid point between the above two Experimental data on an AMD EPYC Zen2 machine slightly favors the third option, which is applied with this patch. The analysis uses the ondemand cpufreq governor as baseline, and compares it with schedutil in a number of configurations. Using the freq_max value described above offers a moderate advantage in performance and efficiency: sugov-max (freq_max=max_boost) performs the worst on tbench: less throughput and reduced efficiency than the other invariant-schedutil options (see "Data Overview" below). Consider that tbench is generally a problematic case as no schedutil version currently is better than ondemand. sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's can surpass ondemand with less filesystem latency and slightly increased efficiency. 1. DATA OVERVIEW 2. DETAILED PERFORMANCE TABLES 3. POWER CONSUMPTION TABLE 1. DATA OVERVIEW ================ sugov-noinv : non-invariant schedutil governor sugov-max : invariant schedutil, freq_max=max_boost sugov-mid : invariant schedutil, freq_max=midpoint sugov-P0 : invariant schedutil, freq_max=max_P perfgov : performance governor driver : acpi_cpufreq machine : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket, 128 cores / 256 threads, SATA SSD storage, 250G of memory, XFS filesystem Benchmarks are described in the next section. Tilde (~) means the value is the same as baseline. Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@xxxxxxx --- arch/x86/kernel/smpboot.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index a4ab5cf..c5dd5f6 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -2054,6 +2054,8 @@ static bool amd_set_max_freq_ratio(void) } perf_ratio = div_u64(highest_perf * SCHED_CAPACITY_SCALE, nominal_perf); + /* midpoint between max_boost and max_P */ + perf_ratio = (perf_ratio + SCHED_CAPACITY_SCALE) >> 1; if (!perf_ratio) { pr_debug("Non-zero highest/nominal perf values led to a 0 ratio\n"); return false;
![]() |