Re: CPU excessively long times between frequency scaling driver calls - bisected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 16, 2022 at 4:55 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>
> Readers: So that graphs and large attachments could be used, I have
> been on an off-list branch of this thread with Srinivas, and copied a
> couple of others. While now returning to this on-list thread, I'll
> only take up Rafael's proposed patch.
>
> Hi Rafael,
>
> So far all work has been done with: HWP disabled; intel_pstate; powersave.
> The reason was that it is, by far, the best way to obtain good trace data
> using the intel_pstate_tracer.py utility.
>
> I always intended to try/test: HWP disabled; intel_cpufreq; schedutil.
> There is an issue with the proposed patch and schedutil.
>
> If any CPU ever requests a pstate > the max non turbo pstate
> then it will stay at that request forever. Ultimately the idle
> power goes to about 5.7 watts (verses 1.4 watts expected).
> IRQs go very high, as the tick never turns off.
> Actually, one knows how many CPUs are stuck requesting a high
> pstate just by looking at IRQs.

That may be because INTEL_CPUFREQ_TRANSITION_DELAY is too small.

Please try to increase
/sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us to 10000 and
see what difference this makes.

> Trace is useless because it virtually never gets called.
> So I have been reading the IA32_PERF_CTL MSR
> directly.
>
> Example:
>
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> 6 cores, 12 CPUs
> min pstate 8
> max non-turbo pstate 41
> max turbo pstate 48
> The system is idle.
>
> doug@s19:~$ sudo
> /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --Summary
> --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt --interval 10
> Busy%   Bzy_MHz IRQ     PkgWatt
> 0.11    800     844     1.33
> 0.01    800     231     1.33
> 0.11    800     723     1.33 <<< Powersave governor
> 0.03    889     440     1.33
> 0.17    4418    21511   4.31 <<< Schedutil governor
> 0.12    4101    30153   4.48 <<< 3 CPUs are > pstate 41
> 0.22    4347    34226   4.75
> 0.17    4101    43554   4.78
> 0.29    4300    50565   4.94
> 0.21    4098    50297   4.76 <<< 5 CPUs are > pstate 41
> 0.29    4298    50532   4.84
> 0.20    4101    50126   4.63
> 0.20    4101    50149   4.62
> 0.29    4297    50623   4.76
> 0.20    4101    50203   4.72
> 0.29    4295    50642   4.78
> 0.20    4101    50223   4.68
> 0.29    4292    50597   4.88
> 0.20    4101    50208   4.73
> 0.29    4296    50519   4.84
> 0.20    4101    50167   4.80
> 0.20    4101    50242   4.76
> 0.29    4302    50625   4.94
> 0.20    4101    50233   4.73
> 0.29    4296    50613   4.78
> 0.20    4101    50231   4.70
> 0.29    4292    50802   4.93
> 1.46    4669    65610   8.36
> 0.41    4225    80701   5.48
> 0.33    4101    80219   5.36 <<< 8 CPUs are > ptstate 41
> 0.34    4098    80313   5.38
> 0.41    4228    80689   5.56
> 0.33    4101    80252   5.46
>
> And the related MSR reads:
>
> 3 CPUs are > pstate 41:
> root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  30 :   8 :   8 :  48 :
> 48 :  48 :   8 :  30 :  31 :   8 :   8 :   8 :
>
> 5 CPUs are > psate 41:
> root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  44 :  30 :  31 :  48 :
> 48 :  48 :   8 :   8 :   8 :   8 :  48 :   8 :
>
> 8 CPUs are > pstate 41:
> root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  45 :  48 :  48 :  48 :
> 48 :  48 :   8 :  30 :   8 :   8 :  48 :  42 :
>
> This issue is independent of the original patch or the suggested modification:
>
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index f878a4545eee..94018ac0b59b 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1980,7 +1980,7 @@ static void intel_pstate_update_perf_ctl(struct
> > cpudata *cpu)
> >          * P-states to prevent them from getting back to the high frequency
> >          * right away after getting out of deep idle.
> >          */
> > -       cpuidle_update_retain_tick(pstate > cpu->pstate.max_pstate);
> > +       cpuidle_update_retain_tick(pstate > ((cpu->pstate.max_pstate +
> > cpu->pstate.min_pstate)/2));
> >         wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
> >  }
>
> ... Doug



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux