On 23-07-19, 12:27, Rafael J. Wysocki wrote: > On Tue, Jul 23, 2019 at 11:15 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote: > > Though there is one difference between intel_cpufreq and acpi_cpufreq, > > intel_cpufreq has fast_switch_possible=true and so it uses slightly > > different path in schedutil. I tried to look from that perspective as > > well but couldn't find anything wrong. > > acpi-cpufreq should use fast switching on the Doug's system too. Ah okay. > > If you still find intel_cpufreq to be broken, even with this patch, > > please set fast_switch_possible=false instead of true in > > __intel_pstate_cpu_init() and try tests again. That shall make it very > > much similar to acpi-cpufreq driver. > > I wonder if this helps. Even so, we want fast switching to be used by > intel_cpufreq. With both using fast switching it shouldn't make any difference. > Anyway, it looks like the change reverted by the Doug's patch > introduced a race condition that had not been present before. Namely, > need_freq_update is cleared in get_next_freq() when it is set _or_ > when the new freq is different from the cached one, so in the latter > case if it happens to be set by sugov_limits() after evaluating > sugov_should_update_freq() (which returned 'true' for timing reasons), > that update will be lost now. [Previously the update would not be > lost, because the clearing of need_freq_update depended only on its > current value.] Where it matters is that in the "need_freq_update set" > case, the "premature frequency reduction avoidance" should not be > applied (as you noticed and hence the $subject patch). > > However, even with the $subject patch, need_freq_update may still be > set by sugov_limits() after the check added by it and then cleared by > get_next_freq(), so it doesn't really eliminate the problem. > > IMO eliminating would require invalidating next_freq this way or > another when need_freq_update is set in sugov_should_update_freq(), > which was done before commit ecd2884291261e3fddbc7651ee11a20d596bb514. Hmm, so to avoid locking in fast path we need two variable group to protect against this kind of issues. I still don't want to override next_freq with a special meaning as it can cause hidden bugs, we have seen that earlier. What about something like this then ? -- viresh -------------------------8<------------------------- diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 636ca6f88c8e..2f382b0959e5 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -40,6 +40,7 @@ struct sugov_policy { struct task_struct *thread; bool work_in_progress; + bool limits_changed; bool need_freq_update; }; @@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) !cpufreq_this_cpu_can_update(sg_policy->policy)) return false; - if (unlikely(sg_policy->need_freq_update)) + if (unlikely(sg_policy->limits_changed)) { + sg_policy->limits_changed = false; + sg_policy->need_freq_update = true; return true; + } delta_ns = time - sg_policy->last_freq_update_time; @@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) { if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl) - sg_policy->need_freq_update = true; + sg_policy->limits_changed = true; } static void sugov_update_single(struct update_util_data *hook, u64 time, @@ -447,7 +451,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, struct sugov_policy *sg_policy = sg_cpu->sg_policy; unsigned long util, max; unsigned int next_f; - bool busy; + bool busy = false; sugov_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; @@ -457,7 +461,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (!sugov_should_update_freq(sg_policy, time)) return; - busy = sugov_cpu_is_busy(sg_cpu); + /* Limits may have changed, don't skip frequency update */ + if (!sg_policy->need_freq_update) + busy = sugov_cpu_is_busy(sg_cpu); util = sugov_get_util(sg_cpu); max = sg_cpu->max; @@ -831,6 +837,7 @@ static int sugov_start(struct cpufreq_policy *policy) sg_policy->last_freq_update_time = 0; sg_policy->next_freq = 0; sg_policy->work_in_progress = false; + sg_policy->limits_changed = false; sg_policy->need_freq_update = false; sg_policy->cached_raw_freq = 0; @@ -879,7 +886,7 @@ static void sugov_limits(struct cpufreq_policy *policy) mutex_unlock(&sg_policy->work_lock); } - sg_policy->need_freq_update = true; + sg_policy->limits_changed = true; } struct cpufreq_governor schedutil_gov = {