On Thu, Mar 17, 2022 at 5:30 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > On Wed, Mar 16, 2022 at 4:55 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote: > > > > Readers: So that graphs and large attachments could be used, I have > > been on an off-list branch of this thread with Srinivas, and copied a > > couple of others. While now returning to this on-list thread, I'll > > only take up Rafael's proposed patch. > > > > Hi Rafael, > > > > So far all work has been done with: HWP disabled; intel_pstate; powersave. > > The reason was that it is, by far, the best way to obtain good trace data > > using the intel_pstate_tracer.py utility. > > > > I always intended to try/test: HWP disabled; intel_cpufreq; schedutil. > > There is an issue with the proposed patch and schedutil. > > > > If any CPU ever requests a pstate > the max non turbo pstate > > then it will stay at that request forever. Ultimately the idle > > power goes to about 5.7 watts (verses 1.4 watts expected). > > IRQs go very high, as the tick never turns off. > > Actually, one knows how many CPUs are stuck requesting a high > > pstate just by looking at IRQs. > > That may be because INTEL_CPUFREQ_TRANSITION_DELAY is too small. > > Please try to increase > /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us to 10000 and > see what difference this makes. Changing rate_limit_us to 10000, or even 20000, makes no difference. see a slight clarification to yesterday's email in-line below. > > Trace is useless because it virtually never gets called. > > So I have been reading the IA32_PERF_CTL MSR > > directly. > > > > Example: > > > > Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz > > 6 cores, 12 CPUs > > min pstate 8 > > max non-turbo pstate 41 > > max turbo pstate 48 > > The system is idle. > > > > doug@s19:~$ sudo > > /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --Summary > > --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt --interval 10 > > Busy% Bzy_MHz IRQ PkgWatt > > 0.11 800 844 1.33 > > 0.01 800 231 1.33 > > 0.11 800 723 1.33 <<< Powersave governor > > 0.03 889 440 1.33 > > 0.17 4418 21511 4.31 <<< Schedutil governor > > 0.12 4101 30153 4.48 <<< 3 CPUs are > pstate 41 > > 0.22 4347 34226 4.75 > > 0.17 4101 43554 4.78 > > 0.29 4300 50565 4.94 > > 0.21 4098 50297 4.76 <<< 5 CPUs are > pstate 41 > > 0.29 4298 50532 4.84 > > 0.20 4101 50126 4.63 > > 0.20 4101 50149 4.62 > > 0.29 4297 50623 4.76 > > 0.20 4101 50203 4.72 > > 0.29 4295 50642 4.78 > > 0.20 4101 50223 4.68 > > 0.29 4292 50597 4.88 > > 0.20 4101 50208 4.73 > > 0.29 4296 50519 4.84 > > 0.20 4101 50167 4.80 > > 0.20 4101 50242 4.76 > > 0.29 4302 50625 4.94 > > 0.20 4101 50233 4.73 > > 0.29 4296 50613 4.78 > > 0.20 4101 50231 4.70 > > 0.29 4292 50802 4.93 > > 1.46 4669 65610 8.36 > > 0.41 4225 80701 5.48 > > 0.33 4101 80219 5.36 <<< 8 CPUs are > ptstate 41 > > 0.34 4098 80313 5.38 > > 0.41 4228 80689 5.56 > > 0.33 4101 80252 5.46 > > > > And the related MSR reads: > > > > 3 CPUs are > pstate 41: > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 30 : 8 : 8 : 48 : > > 48 : 48 : 8 : 30 : 31 : 8 : 8 : 8 : > > > > 5 CPUs are > psate 41: > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 44 : 30 : 31 : 48 : > > 48 : 48 : 8 : 8 : 8 : 8 : 48 : 8 : > > > > 8 CPUs are > pstate 41: > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL > > 9.) 0x199: IA32_PERF_CTL : CPU 0-11 : 45 : 48 : 48 : 48 : > > 48 : 48 : 8 : 30 : 8 : 8 : 48 : 42 : > > > > This issue is independent of the original patch or the suggested modification: Actually, the issue threshold is as defined by the greater than condition below. > > > > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > > > index f878a4545eee..94018ac0b59b 100644 > > > --- a/drivers/cpufreq/intel_pstate.c > > > +++ b/drivers/cpufreq/intel_pstate.c > > > @@ -1980,7 +1980,7 @@ static void intel_pstate_update_perf_ctl(struct > > > cpudata *cpu) > > > * P-states to prevent them from getting back to the high frequency > > > * right away after getting out of deep idle. > > > */ > > > - cpuidle_update_retain_tick(pstate > cpu->pstate.max_pstate); For the above kernel the threshold is pstate 42. > > > + cpuidle_update_retain_tick(pstate > ((cpu->pstate.max_pstate + > > > cpu->pstate.min_pstate)/2)); For the above kernel the threshold is pstate 25. > > > wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate)); > > > } > > > > ... Doug