On Wed, Mar 16, 2022 at 4:55 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote: > > Readers: So that graphs and large attachments could be used, I have > been on an off-list branch of this thread with Srinivas, and copied a > couple of others. While now returning to this on-list thread, I'll > only take up Rafael's proposed patch. > > Hi Rafael, > > So far all work has been done with: HWP disabled; intel_pstate; powersave. > The reason was that it is, by far, the best way to obtain good trace data > using the intel_pstate_tracer.py utility. > > I always intended to try/test: HWP disabled; intel_cpufreq; schedutil. > There is an issue with the proposed patch and schedutil. > > If any CPU ever requests a pstate > the max non turbo pstate > then it will stay at that request forever. Ultimately the idle > power goes to about 5.7 watts (verses 1.4 watts expected). > IRQs go very high, as the tick never turns off. > Actually, one knows how many CPUs are stuck requesting a high > pstate just by looking at IRQs. That may be because INTEL_CPUFREQ_TRANSITION_DELAY is too small. Please try to increase /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us to 10000 and see what difference this makes. > Trace is useless because it virtually never gets called. > So I have been reading the IA32_PERF_CTL MSR > directly. > > Example: > > Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz > 6 cores, 12 CPUs > min pstate 8 > max non-turbo pstate 41 > max turbo pstate 48 > The system is idle. > > doug@s19:~$ sudo > /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --Summary > --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt --interval 10 > Busy% Bzy_MHz IRQ PkgWatt > 0.11 800 844 1.33 > 0.01 800 231 1.33 > 0.11 800 723 1.33 <<< Powersave governor > 0.03 889 440 1.33 > 0.17 4418 21511 4.31 <<< Schedutil governor > 0.12 4101 30153 4.48 <<< 3 CPUs are > pstate 41 > 0.22 4347 34226 4.75 > 0.17 4101 43554 4.78 > 0.29 4300 50565 4.94 > 0.21 4098 50297 4.76 <<< 5 CPUs are > pstate 41 > 0.29 4298 50532 4.84 > 0.20 4101 50126 4.63 > 0.20 4101 50149 4.62 > 0.29 4297 50623 4.76 > 0.20 4101 50203 4.72 > 0.29 4295 50642 4.78 > 0.20 4101 50223 4.68 > 0.29 4292 50597 4.88 > 0.20 4101 50208 4.73 > 0.29 4296 50519 4.84 > 0.20 4101 50167 4.80 > 0.20 4101 50242 4.76 > 0.29 4302 50625 4.94 > 0.20 4101 50233 4.73 > 0.29 4296 50613 4.78 > 0.20 4101 50231 4.70 > 0.29 4292 50802 4.93 > 1.46 4669 65610 8.36 > 0.41 4225 80701 5.48 > 0.33 4101 80219 5.36 <<< 8 CPUs are > ptstate 41 > 0.34 4098 80313 5.38 > 0.41 4228 80689 5.56 > 0.33 4101 80252 5.46 > > And the related MSR reads: > > 3 CPUs are > pstate 41: > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 30 : 8 : 8 : 48 : > 48 : 48 : 8 : 30 : 31 : 8 : 8 : 8 : > > 5 CPUs are > psate 41: > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 44 : 30 : 31 : 48 : > 48 : 48 : 8 : 8 : 8 : 8 : 48 : 8 : > > 8 CPUs are > pstate 41: > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 45 : 48 : 48 : 48 : > 48 : 48 : 8 : 30 : 8 : 8 : 48 : 42 : > > This issue is independent of the original patch or the suggested modification: > > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > > index f878a4545eee..94018ac0b59b 100644 > > --- a/drivers/cpufreq/intel_pstate.c > > +++ b/drivers/cpufreq/intel_pstate.c > > @@ -1980,7 +1980,7 @@ static void intel_pstate_update_perf_ctl(struct > > cpudata *cpu) > > * P-states to prevent them from getting back to the high frequency > > * right away after getting out of deep idle. > > */ > > - cpuidle_update_retain_tick(pstate > cpu->pstate.max_pstate); > > + cpuidle_update_retain_tick(pstate > ((cpu->pstate.max_pstate + > > cpu->pstate.min_pstate)/2)); > > wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate)); > > } > > ... Doug