Re: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

Viresh Kumar <viresh.kumar@xxxxxxxxxx> · Wed, 24 Jul 2019 17:13:27 +0530

On 23-07-19, 12:27, Rafael J. Wysocki wrote:
> On Tue, Jul 23, 2019 at 11:15 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> > Though there is one difference between intel_cpufreq and acpi_cpufreq,
> > intel_cpufreq has fast_switch_possible=true and so it uses slightly
> > different path in schedutil. I tried to look from that perspective as
> > well but couldn't find anything wrong.
> 
> acpi-cpufreq should use fast switching on the Doug's system too.

Ah okay.

> > If you still find intel_cpufreq to be broken, even with this patch,
> > please set fast_switch_possible=false instead of true in
> > __intel_pstate_cpu_init() and try tests again. That shall make it very
> > much similar to acpi-cpufreq driver.
> 
> I wonder if this helps.  Even so, we want fast switching to be used by
> intel_cpufreq.

With both using fast switching it shouldn't make any difference.

> Anyway, it looks like the change reverted by the Doug's patch
> introduced a race condition that had not been present before.  Namely,
> need_freq_update is cleared in get_next_freq() when it is set _or_
> when the new freq is different from the cached one, so in the latter
> case if it happens to be set by sugov_limits() after evaluating
> sugov_should_update_freq() (which returned 'true' for timing reasons),
> that update will be lost now. [Previously the update would not be
> lost, because the clearing of need_freq_update depended only on its
> current value.] Where it matters is that in the "need_freq_update set"
> case, the "premature frequency reduction avoidance" should not be
> applied (as you noticed and hence the $subject patch).
> 
> However, even with the $subject patch, need_freq_update may still be
> set by sugov_limits() after the check added by it and then cleared by
> get_next_freq(), so it doesn't really eliminate the problem.
> 
> IMO eliminating would require invalidating next_freq this way or
> another when need_freq_update is set in sugov_should_update_freq(),
> which was done before commit ecd2884291261e3fddbc7651ee11a20d596bb514.

Hmm, so to avoid locking in fast path we need two variable group to
protect against this kind of issues. I still don't want to override
next_freq with a special meaning as it can cause hidden bugs, we have
seen that earlier.

What about something like this then ?

-- 
viresh

-------------------------8<-------------------------

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 636ca6f88c8e..2f382b0959e5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -40,6 +40,7 @@ struct sugov_policy {
 	struct task_struct	*thread;
 	bool			work_in_progress;
 
+	bool			limits_changed;
 	bool			need_freq_update;
 };
 
@@ -89,8 +90,11 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 	    !cpufreq_this_cpu_can_update(sg_policy->policy))
 		return false;
 
-	if (unlikely(sg_policy->need_freq_update))
+	if (unlikely(sg_policy->limits_changed)) {
+		sg_policy->limits_changed = false;
+		sg_policy->need_freq_update = true;
 		return true;
+	}
 
 	delta_ns = time - sg_policy->last_freq_update_time;
 
@@ -437,7 +441,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
 static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
 {
 	if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
-		sg_policy->need_freq_update = true;
+		sg_policy->limits_changed = true;
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -447,7 +451,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long util, max;
 	unsigned int next_f;
-	bool busy;
+	bool busy = false;
 
 	sugov_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
@@ -457,7 +461,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	busy = sugov_cpu_is_busy(sg_cpu);
+	/* Limits may have changed, don't skip frequency update */
+	if (!sg_policy->need_freq_update)
+		busy = sugov_cpu_is_busy(sg_cpu);
 
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
@@ -831,6 +837,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 	sg_policy->last_freq_update_time	= 0;
 	sg_policy->next_freq			= 0;
 	sg_policy->work_in_progress		= false;
+	sg_policy->limits_changed		= false;
 	sg_policy->need_freq_update		= false;
 	sg_policy->cached_raw_freq		= 0;
 
@@ -879,7 +886,7 @@ static void sugov_limits(struct cpufreq_policy *policy)
 		mutex_unlock(&sg_policy->work_lock);
 	}
 
-	sg_policy->need_freq_update = true;
+	sg_policy->limits_changed = true;
 }
 
 struct cpufreq_governor schedutil_gov = {