Re: CPU excessively long times between frequency scaling driver calls - bisected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 17, 2022 at 5:30 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> On Wed, Mar 16, 2022 at 4:55 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >
> > Readers: So that graphs and large attachments could be used, I have
> > been on an off-list branch of this thread with Srinivas, and copied a
> > couple of others. While now returning to this on-list thread, I'll
> > only take up Rafael's proposed patch.
> >
> > Hi Rafael,
> >
> > So far all work has been done with: HWP disabled; intel_pstate; powersave.
> > The reason was that it is, by far, the best way to obtain good trace data
> > using the intel_pstate_tracer.py utility.
> >
> > I always intended to try/test: HWP disabled; intel_cpufreq; schedutil.
> > There is an issue with the proposed patch and schedutil.
> >
> > If any CPU ever requests a pstate > the max non turbo pstate
> > then it will stay at that request forever. Ultimately the idle
> > power goes to about 5.7 watts (verses 1.4 watts expected).
> > IRQs go very high, as the tick never turns off.
> > Actually, one knows how many CPUs are stuck requesting a high
> > pstate just by looking at IRQs.
>
> That may be because INTEL_CPUFREQ_TRANSITION_DELAY is too small.
>
> Please try to increase
> /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us to 10000 and
> see what difference this makes.

Changing rate_limit_us to 10000, or even 20000, makes no difference.

see a slight clarification to yesterday's email in-line below.

> > Trace is useless because it virtually never gets called.
> > So I have been reading the IA32_PERF_CTL MSR
> > directly.
> >
> > Example:
> >
> > Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> > 6 cores, 12 CPUs
> > min pstate 8
> > max non-turbo pstate 41
> > max turbo pstate 48
> > The system is idle.
> >
> > doug@s19:~$ sudo
> > /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --Summary
> > --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt --interval 10
> > Busy%   Bzy_MHz IRQ     PkgWatt
> > 0.11    800     844     1.33
> > 0.01    800     231     1.33
> > 0.11    800     723     1.33 <<< Powersave governor
> > 0.03    889     440     1.33
> > 0.17    4418    21511   4.31 <<< Schedutil governor
> > 0.12    4101    30153   4.48 <<< 3 CPUs are > pstate 41
> > 0.22    4347    34226   4.75
> > 0.17    4101    43554   4.78
> > 0.29    4300    50565   4.94
> > 0.21    4098    50297   4.76 <<< 5 CPUs are > pstate 41
> > 0.29    4298    50532   4.84
> > 0.20    4101    50126   4.63
> > 0.20    4101    50149   4.62
> > 0.29    4297    50623   4.76
> > 0.20    4101    50203   4.72
> > 0.29    4295    50642   4.78
> > 0.20    4101    50223   4.68
> > 0.29    4292    50597   4.88
> > 0.20    4101    50208   4.73
> > 0.29    4296    50519   4.84
> > 0.20    4101    50167   4.80
> > 0.20    4101    50242   4.76
> > 0.29    4302    50625   4.94
> > 0.20    4101    50233   4.73
> > 0.29    4296    50613   4.78
> > 0.20    4101    50231   4.70
> > 0.29    4292    50802   4.93
> > 1.46    4669    65610   8.36
> > 0.41    4225    80701   5.48
> > 0.33    4101    80219   5.36 <<< 8 CPUs are > ptstate 41
> > 0.34    4098    80313   5.38
> > 0.41    4228    80689   5.56
> > 0.33    4101    80252   5.46
> >
> > And the related MSR reads:
> >
> > 3 CPUs are > pstate 41:
> > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  30 :   8 :   8 :  48 :
> > 48 :  48 :   8 :  30 :  31 :   8 :   8 :   8 :
> >
> > 5 CPUs are > psate 41:
> > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  44 :  30 :  31 :  48 :
> > 48 :  48 :   8 :   8 :   8 :   8 :  48 :   8 :
> >
> > 8 CPUs are > pstate 41:
> > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  45 :  48 :  48 :  48 :
> > 48 :  48 :   8 :  30 :   8 :   8 :  48 :  42 :
> >
> > This issue is independent of the original patch or the suggested modification:

Actually, the issue threshold is as defined by the greater than condition below.

> >
> > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > > index f878a4545eee..94018ac0b59b 100644
> > > --- a/drivers/cpufreq/intel_pstate.c
> > > +++ b/drivers/cpufreq/intel_pstate.c
> > > @@ -1980,7 +1980,7 @@ static void intel_pstate_update_perf_ctl(struct
> > > cpudata *cpu)
> > >          * P-states to prevent them from getting back to the high frequency
> > >          * right away after getting out of deep idle.
> > >          */
> > > -       cpuidle_update_retain_tick(pstate > cpu->pstate.max_pstate);

For the above kernel the threshold is pstate 42.

> > > +       cpuidle_update_retain_tick(pstate > ((cpu->pstate.max_pstate +
> > > cpu->pstate.min_pstate)/2));

For the above kernel the threshold is pstate 25.

> > >         wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
> > >  }
> >
> > ... Doug



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux