On Mon, Feb 14, 2022 at 07:17:24AM -0800, srinivas pandruvada wrote: > Hi Doug, > > I think you use CONFIG_NO_HZ_FULL. > Here we are getting callback from scheduler. Can we check that if > scheduler woke up on those CPUs? > We can run "trace-cmd -e sched" and check in kernel shark if there is > similar gaps in activity. Srinivas analyzed the scheduler trace data from trace-cmd, and thought is related with the cpufreq callback is not called timeley from scheduling events: " I mean we ignore the callback when the target CPU is not a local CPU as we have to do IPI to adjust MSRs. This will happen many times when sched_wake will wake up a new CPU for the thread (we will get a callack for the target) but once the remote thread start executing "sched_switch", we will get a callback on local CPU, so we will adjust frequencies (provided 10ms interval from the last call). >From the trace file I see the scenario where it took 72sec between two updates: CPU 2 34412.597161 busy=78 freq=3232653 34484.450725 busy=63 freq=2606793 There is periodic activity in between, related to active load balancing in scheduler (since last frequency was higher these small work will also run at higher frequency). But those threads are not CFS class, so scheduler callback will not be called for them. So removing the patch removed a trigger which would have caused a sched_switch to a CFS task and call a cpufreq/intel_pstate callback. But calling for every class, will be too many callbacks and not sure we can even call for "stop" class, which these migration threads are using. " Following this direction, I made a hacky debug patch which should help to restore the previous behavior. Doug, could you help to try it? thanks It basically tries to make sure the cpufreq-update-util be called timely even for a silent system with very few interrupts (even from tick). Thanks, Feng