On Tue, Mar 01, 2022 at 08:06:24PM -0800, Doug Smythies wrote: > On Tue, Mar 1, 2022 at 9:34 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > I guess the numbers above could be reduced still by using a P-state > > below the max non-turbo one as a limit. > > Yes, and for a test I did "rjw-3". > > > > overruns: 1042. > > > max overrun time: 9,769 uSec. > > > > This would probably get worse then, though. > > Yes, that was my expectation, but not what happened. > > rjw-3: > ave: 3.09 watts > min: 3.01 watts > max: 31.7 watts > ave freq: 2.42 GHz. > overruns: 12. (I did not expect this.) > Max overruns time: 621 uSec. > > Note 1: IRQ's increased by 74%. i.e. it was going in > and out of idle a lot more. > > Note 2: We know that processor package power > is highly temperature dependent. I forgot to let my > coolant cool adequately after the kernel compile, > and so had to throw out the first 4 power samples > (20 minutes). > > I retested both rjw-2 and rjw-3, but shorter tests > and got 0 overruns in both cases. One thought is can we consider trying the previous debug patch of calling the util_update when entering idle (time limited). In current code, the RT/CFS/Deadline class all have places to call cpufreq_update_util(), the patch will make sure it is called in all four classes, also it follows the principle of 'schedutil' of not introducing more system cost. And surely I could be missing some details here. Following is a cleaner version of the patch, and the code could be moved down to the internal loop of while (!need_resched()) { } Which will make it get called more frequently. --- diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index d17b0a5ce6ac..e12688036725 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -258,15 +258,23 @@ static void cpuidle_idle_call(void) * * Called with polling cleared. */ +DEFINE_PER_CPU(u64, last_util_update_time); /* in jiffies */ static void do_idle(void) { int cpu = smp_processor_id(); + u64 expire; /* * Check if we need to update blocked load */ nohz_run_idle_balance(cpu); + expire = __this_cpu_read(last_util_update_time) + HZ * 3; + if (unlikely(time_is_before_jiffies((unsigned long)expire))) { + cpufreq_update_util(this_rq(), 0); + __this_cpu_write(last_util_update_time, get_jiffies_64()); + } + /* * If the arch has a polling bit, we maintain an invariant: * Thanks, Feng > > ATM I'm not quite sure why this happens, but you seem to have some > > insight into it, so it would help if you shared it. > > My insight seems questionable. > > My thinking was that one can not decide if the pstate needs to go > down or not based on such a localized look. The risk being that the > higher periodic load might suffer overruns. Since my first test did exactly > that, I violated my own "repeat all tests 3 times before reporting rule". > Now, I am not sure what is going on. > I will need more time to acquire traces and dig into it. > > I also did a 1 hour intel_pstate_tracer test, with rjw-2, on an idle system > and saw several long durations. This was expected as this patch set > wouldn't change durations by more than a few jiffies. > 755 long durations (>6.1 seconds), and 327.7 seconds longest. > > ... Doug