On Tue, Feb 2, 2021 at 7:45 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Tue, Jan 26, 2021 at 5:19 PM Giovanni Gherdovich <ggherdovich@xxxxxxx> wrote: > > > > On Mon, 2021-01-25 at 11:04 +0100, Peter Zijlstra wrote: > > > On Fri, Jan 22, 2021 at 09:40:38PM +0100, Giovanni Gherdovich wrote: > > > > This workload is constant in time, so instead of using the PELT sum we can > > > > pretend that scale invariance is obtained with > > > > > > > > util_inv = util_raw * freq_curr / freq_max1 [formula-1] > > > > > > > > where util_raw is the PELT util from v5.10 (which is to say, not invariant), > > > > and util_inv is the PELT util from v5.11-rc4. freq_max1 comes from > > > > commit 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for > > > > frequency invariance on AMD EPYC") and is (P0+max_boost)/2 = (2.25+3.4)/2 = > > > > 2.825 GHz. Then we have the schedutil formula > > > > > > > > freq_next = 1.25 * freq_max2 * util_inv [formula-2] > > > > > > > > Here v5.11-rc4 uses freq_max2 = P0 = 2.25 GHz (and this patch changes it to > > > > 3.4 GHz). > > > > > > > > Since all cores are busy, there is no boost available. Let's be generous and say > > > > the tasks initially get P0, i.e. freq_curr = 2.25 GHz. Combining the formulas > > > > above and taking util_raw = 825/1024 = 0.8, freq_next is: > > > > > > > > freq_next = 1.25 * 2.25 * 0.8 * 2.25 / 2.825 = 1.79 GHz > > > > > > Right, so here's a 'problem' between schedutil and cpufreq, they don't > > > use the same f_max at all times. > > > > > > And this is also an inconsistency between acpi_cpufreq and intel_pstate > > > (passive). IIRC the intel_pstate cpufreq drivers uses 4C/1C/P0 resp, > > > while ACPI seems to stick to P0 f_max. > > > > That's correct. A different f_max is used depending on the occasion. Let me > > rephrase with: > > OK, I confused the terminology, sorry about that. > > > cpufreq core asks the driver what's the f_max. What's the answer? > > > > intel_pstate says: 1C > > Yes, unless turbo is disabled, in which case it is P0. BTW, and that actually is quite important, the max_freq reported by intel_pstate doesn't matter for schedutil after the new ->adjust_perf callback has been added, because that doesn't even use the frequency. So, as a long-term remedy, it may just be better to implement ->adjust_perf in acpi_cpufreq(). Again, I'm terribly sorry for missing this thread and the patch. > > acpi_cpufreq says: P0 > > This is P0+1, isn't it? > > > scheduler asks the freq-invariance machinery what's f_max, because it needs to > > compute f_curr/f_max. What's the answer? > > > > Intel CPUs: 4C in most cases, 1C on Atom, something else on Xeon Phi. > > AMD CPUs: (P0 + 1C) / 2. > > > > > > Legend: > > 1C = 1-core boost > > 4C = 4-cores boost > > P0 = max non-boost P-States