On Wed, May 9, 2018 at 10:41 AM, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote: > On 08-05-18, 22:54, Rafael J. Wysocki wrote: >> On Tue, May 8, 2018 at 8:42 AM, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote: >> > The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain >> > occasions: >> > - In sugov_start(), when the schedutil governor is started for a group >> > of CPUs. >> > - And whenever we need to force a freq update before rate-limit >> > duration, which happens when: >> > - there is an update in cpufreq policy limits. >> > - Or when the utilization of DL scheduling class increases. >> > >> > In return, get_next_freq() doesn't return a cached next_freq value but >> > instead recalculates the next frequency. This has some side effects >> > though and may significantly delay a required increase in frequency. >> > >> > In sugov_update_single() we try to avoid decreasing frequency if the CPU >> > has not been idle recently. Consider this scenario, the available range >> > of frequencies for a CPU are from 800 MHz to 2.5 GHz and current >> > frequency is 800 MHz. From one of the call paths >> > sg_policy->need_freq_update is set to true and hence >> > sg_policy->next_freq is set to UINT_MAX. Now if the CPU had been busy, >> > next_f will always be less than UINT_MAX, whatever the value of next_f >> > is. And so even when we wanted to increase the frequency, we will >> > overwrite next_f with UINT_MAX and will not change the frequency >> > eventually. This will continue until the time CPU stays busy. This isn't >> > cross checked with any specific test cases, but rather based on general >> > code review. >> > >> > Fix that by not resetting the sg_policy->need_freq_update flag from >> > sugov_should_update_freq() but get_next_freq() and we wouldn't need to >> > overwrite sg_policy->next_freq anymore. >> > >> > Cc: 4.12+ <stable@xxxxxxxxxxxxxxx> # 4.12+ >> > Fixes: b7eaf1aab9f8 ("cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely") >> >> The rest of the chantelog is totally disconnected from this commit. > > I added the "Fixes" tag because this is exactly the commit after which > this problem started, isn't it? > >> So the problem is that sugov_update_single() doesn't check the special >> UNIT_MAX value before assigning sg_policy->next_freq to next_f. Fair >> enough. >> >> I don't see why the patch is the right fix for that, however. > > I thought not overwriting next_freq makes things much simpler and easy > to review. I'm kind of concerned about updating the limits via sysfs in which case the cached next frequency may be out of range, so it's better to invalidate it right away then. > What else do you have in mind to solve this problem ? Something like the below? --- kernel/sched/cpufreq_schedutil.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-pm/kernel/sched/cpufreq_schedutil.c =================================================================== --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c +++ linux-pm/kernel/sched/cpufreq_schedutil.c @@ -305,7 +305,8 @@ static void sugov_update_single(struct u * Do not reduce the frequency if the CPU has not been idle * recently, as the reduction is likely to be premature then. */ - if (busy && next_f < sg_policy->next_freq) { + if (busy && next_f < sg_policy->next_freq && + sg_policy->next_freq != UINT_MAX) { next_f = sg_policy->next_freq; /* Reset cached freq as next_freq has changed */