On Thu, Feb 24, 2022 at 04:08:30PM +0800, Feng Tang wrote: > On Wed, Feb 23, 2022 at 03:23:20PM +0100, Rafael J. Wysocki wrote: > > On Wed, Feb 23, 2022 at 1:40 AM Feng Tang <feng.tang@xxxxxxxxx> wrote: > > > > > > On Tue, Feb 22, 2022 at 04:32:29PM -0800, srinivas pandruvada wrote: > > > > Hi Doug, > > > > > > > > On Tue, 2022-02-22 at 16:07 -0800, Doug Smythies wrote: > > > > > Hi All, > > > > > > > > > > I am about 1/2 way through testing Feng's "hacky debug patch", > > > > > let me know if I am wasting my time, and I'll abort. So far, it > > > > > works fine. > > > > This just proves that if you add some callback during long idle, you > > > > will reach a less aggressive p-state. I think you already proved that > > > > with your results below showing 1W less average power ("Kernel 5.17-rc3 > > > > + Feng patch (6 samples at 300 sec per"). > > > > > > > > Rafael replied with one possible option. Alternatively when planing to > > > > enter deep idle, set P-state to min with a callback like we do in > > > > offline callback. > > > > > > Yes, if the system is going to idle, it makes sense to goto a lower > > > cpufreq first (also what my debug patch will essentially lead to). > > > > > > Given cprfreq-util's normal running frequency is every 10ms, doing > > > this before entering idle is not a big extra burden. > > > > But this is not related to idle as such, but to the fact that idle > > sometimes stops the scheduler tick which otherwise would run the > > cpufreq governor callback on a regular basis. > > > > It is stopping the tick that gets us into trouble, so I would avoid > > doing it if the current performance state is too aggressive. > > I've tried to simulate Doug's environment by using his kconfig, and > offline my 36 CPUs Desktop to leave 12 CPUs online, and on it I can > still see Local timer interrupts when there is no active load, with > the longest interval between 2 timer interrupts is 4 seconds, while > idle class's task_tick_idle() will do nothing, and CFS' > task_tick_fair() will in turn call cfs_rq_util_change() Every four seconds? Could you please post your .config? Thanx, Paul > I searched the cfs/deadline/rt code, these three classes all have > places to call cpufreq_update_util(), either in enqueue/dequeue or > changing running bandwidth. So I think entering idle also means the > system load is under a big change, and worth calling the cpufreq > callback. > > > In principle, PM QoS can be used for that from intel_pstate, but there > > is a problem with that approach, because it is not obvious what value > > to give to it and it is not always guaranteed to work (say when all of > > the C-states except for C1 are disabled). > > > > So it looks like a new mechanism is needed for that. > > If you think idle class is not the right place to solve it, I can > also help testing new patches. > > Thanks, > Feng >