Re: CPU excessively long times between frequency scaling driver calls - bisected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 24, 2022 at 11:17 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> Hi Doug,
>
> On Thu, Mar 24, 2022 at 3:04 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >
> > Hi Rafael,
> >
> > Do you have any suggestions for the proposed patch?
>
> Not really.
>
> It looks like the avoidance to stop the scheduler tick is sufficient
> to bump up the PELT signal for this workload in such a way that it
> doesn't fall below a certain level at all which in turn causes
> schedutil to ask for higher frequencies.
>
> An alternative approach appears to be necessary, but I need some more
> time for that.

Hi Rafael,

O.K. thanks for the reply.
This can be sidelined for now if you prefer.
As mentioned in one of the off-list emails:

I was always aware that we might be heading towards a solution
tailored to my specific test workflow. It is one relatively simple
thing to create an example workflow that exploits the issue, but quite
another to claim that the proposed solution works for any workflow and
hardware.

>
> > I have tried to figure out what is wrong but haven't been able to.
> >
> > ... Doug
> >
> > On Thu, Mar 17, 2022 at 6:58 AM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 5:30 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > On Wed, Mar 16, 2022 at 4:55 PM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> > > > >
> > > > > Readers: So that graphs and large attachments could be used, I have
> > > > > been on an off-list branch of this thread with Srinivas, and copied a
> > > > > couple of others. While now returning to this on-list thread, I'll
> > > > > only take up Rafael's proposed patch.
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > So far all work has been done with: HWP disabled; intel_pstate; powersave.
> > > > > The reason was that it is, by far, the best way to obtain good trace data
> > > > > using the intel_pstate_tracer.py utility.
> > > > >
> > > > > I always intended to try/test: HWP disabled; intel_cpufreq; schedutil.
> > > > > There is an issue with the proposed patch and schedutil.
> > > > >
> > > > > If any CPU ever requests a pstate > the max non turbo pstate
> > > > > then it will stay at that request forever. Ultimately the idle
> > > > > power goes to about 5.7 watts (verses 1.4 watts expected).
> > > > > IRQs go very high, as the tick never turns off.
> > > > > Actually, one knows how many CPUs are stuck requesting a high
> > > > > pstate just by looking at IRQs.
> > > >
> > > > That may be because INTEL_CPUFREQ_TRANSITION_DELAY is too small.
> > > >
> > > > Please try to increase
> > > > /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us to 10000 and
> > > > see what difference this makes.
> > >
> > > Changing rate_limit_us to 10000, or even 20000, makes no difference.
> > >
> > > see a slight clarification to yesterday's email in-line below.
> > >
> > > > > Trace is useless because it virtually never gets called.
> > > > > So I have been reading the IA32_PERF_CTL MSR
> > > > > directly.
> > > > >
> > > > > Example:
> > > > >
> > > > > Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> > > > > 6 cores, 12 CPUs
> > > > > min pstate 8
> > > > > max non-turbo pstate 41
> > > > > max turbo pstate 48
> > > > > The system is idle.
> > > > >
> > > > > doug@s19:~$ sudo
> > > > > /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --Summary
> > > > > --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt --interval 10
> > > > > Busy%   Bzy_MHz IRQ     PkgWatt
> > > > > 0.11    800     844     1.33
> > > > > 0.01    800     231     1.33
> > > > > 0.11    800     723     1.33 <<< Powersave governor
> > > > > 0.03    889     440     1.33
> > > > > 0.17    4418    21511   4.31 <<< Schedutil governor
> > > > > 0.12    4101    30153   4.48 <<< 3 CPUs are > pstate 41
> > > > > 0.22    4347    34226   4.75
> > > > > 0.17    4101    43554   4.78
> > > > > 0.29    4300    50565   4.94
> > > > > 0.21    4098    50297   4.76 <<< 5 CPUs are > pstate 41
> > > > > 0.29    4298    50532   4.84
> > > > > 0.20    4101    50126   4.63
> > > > > 0.20    4101    50149   4.62
> > > > > 0.29    4297    50623   4.76
> > > > > 0.20    4101    50203   4.72
> > > > > 0.29    4295    50642   4.78
> > > > > 0.20    4101    50223   4.68
> > > > > 0.29    4292    50597   4.88
> > > > > 0.20    4101    50208   4.73
> > > > > 0.29    4296    50519   4.84
> > > > > 0.20    4101    50167   4.80
> > > > > 0.20    4101    50242   4.76
> > > > > 0.29    4302    50625   4.94
> > > > > 0.20    4101    50233   4.73
> > > > > 0.29    4296    50613   4.78
> > > > > 0.20    4101    50231   4.70
> > > > > 0.29    4292    50802   4.93
> > > > > 1.46    4669    65610   8.36
> > > > > 0.41    4225    80701   5.48
> > > > > 0.33    4101    80219   5.36 <<< 8 CPUs are > ptstate 41
> > > > > 0.34    4098    80313   5.38
> > > > > 0.41    4228    80689   5.56
> > > > > 0.33    4101    80252   5.46
> > > > >
> > > > > And the related MSR reads:
> > > > >
> > > > > 3 CPUs are > pstate 41:
> > > > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > > > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  30 :   8 :   8 :  48 :
> > > > > 48 :  48 :   8 :  30 :  31 :   8 :   8 :   8 :
> > > > >
> > > > > 5 CPUs are > psate 41:
> > > > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > > > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  44 :  30 :  31 :  48 :
> > > > > 48 :  48 :   8 :   8 :   8 :   8 :  48 :   8 :
> > > > >
> > > > > 8 CPUs are > pstate 41:
> > > > > root@s19:/home/doug# c/msr-decoder | grep IA32_PERF_CTL
> > > > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-11 :  45 :  48 :  48 :  48 :
> > > > > 48 :  48 :   8 :  30 :   8 :   8 :  48 :  42 :
> > > > >
> > > > > This issue is independent of the original patch or the suggested modification:
> > >
> > > Actually, the issue threshold is as defined by the greater than condition below.
> > >
> > > > >
> > > > > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > > > > > index f878a4545eee..94018ac0b59b 100644
> > > > > > --- a/drivers/cpufreq/intel_pstate.c
> > > > > > +++ b/drivers/cpufreq/intel_pstate.c
> > > > > > @@ -1980,7 +1980,7 @@ static void intel_pstate_update_perf_ctl(struct
> > > > > > cpudata *cpu)
> > > > > >          * P-states to prevent them from getting back to the high frequency
> > > > > >          * right away after getting out of deep idle.
> > > > > >          */
> > > > > > -       cpuidle_update_retain_tick(pstate > cpu->pstate.max_pstate);
> > >
> > > For the above kernel the threshold is pstate 42.
> > >
> > > > > > +       cpuidle_update_retain_tick(pstate > ((cpu->pstate.max_pstate +
> > > > > > cpu->pstate.min_pstate)/2));
> > >
> > > For the above kernel the threshold is pstate 25.
> > >
> > > > > >         wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
> > > > > >  }
> > > > >
> > > > > ... Doug



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux