On 10/3/23 2:50 PM, Pierre Gondois wrote: > Hello Shrikanth, > Some NITs about the commit message: > Hi Pierre. > On 9/29/23 17:52, Shrikanth Hegde wrote: >> sysctl sched_energy_aware is available for the admin to disable/enable >> energy aware scheduling(EAS). EAS is enabled only if few conditions are >> met by the platform. They are, asymmetric CPU capacity, no SMT, >> schedutil CPUfreq governor, frequency invariant load tracking etc. >> A platform may boot without EAS capability, but could gain such >> capability at runtime For example, changing/registering the CPUfreq > > Missing dot I think: 'runtime. For example,' ok. > >> governor to schedutil. >> >> At present, though platform doesn't support EAS, this sysctl returns 1 >> and it ends up calling build_perf_domains on write to 1 and >> NOP when writing to 0. That is confusing and un-necessary. > This is current problematic behavior that patch 2/2 tries to address. > I'm not sure I fully understand the sentence: > - it sounds that the user is writing a value to either 1/0 > (I think the user is writing 1/0 to the sysctl) Yes, any user with root privileges can edit this file and perform read and write. > - aren't the sched domain rebuilt even when writing 0 to the sysctl ? > I'm not sure I understand to what the NOP is referring to exactly. > Complete sched domains aren't built as this case goes to match1 and match2 statements. > What about: > Platforms without EAS capability currently advertise this sysctl. > Its effects (i.e. rebuilding sched-domains) is unnecessary on > such platforms and its presence can be confusing. > look ok. the changelog had described in detail IMHO >> >> Desired behavior would be to, have this sysctl to enable/disable the EAS > > Unnecessary comma I think > >> on supported platform. On Non supported platform write to the sysctl > > Non supported -> non-supported ok for the above two nits. > >> would return not supported error and read of the sysctl would return >> empty. So> sched_energy_aware returns empty - EAS is not possible at >> this moment >> This will include EAS capable platforms which have at least one EAS >> condition false during startup, e.g. using a Performance CPUfreq governor > > Just a remark, using the performance governor is not exactly a condition > disabling EAS, it is more 'not using the schedutil CPUfreq governor' > ok. >> sched_energy_aware returns 0 - EAS is supported but disabled by admin. >> sched_energy_aware returns 1 - EAS is supported and enabled. >> >> User can find out the reason why EAS is not possible by checking >> info messages. sched_is_eas_possible returns true if the platform >> can do EAS at this moment. >> >> Depends on [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit >> to be applied first. > > I think it's implied as the 2 patches are sent together. > yes. Did mention it explicitly since b4 mbox can try apply 2/2 first. had run into similar issues recently. > Otherwise: > Tested-by: Pierre Gondois <pierre.gondois@xxxxxxx> > >> Thank you very much for the testing it and providing the tag. >> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx> >> --- >> Documentation/admin-guide/sysctl/kernel.rst | 3 +- >> kernel/sched/topology.c | 112 +++++++++++++------- >> 2 files changed, 76 insertions(+), 39 deletions(-) >> >> diff --git a/Documentation/admin-guide/sysctl/kernel.rst >> b/Documentation/admin-guide/sysctl/kernel.rst >> index cf33de56da27..d89ac2bd8dc4 100644 >> --- a/Documentation/admin-guide/sysctl/kernel.rst >> +++ b/Documentation/admin-guide/sysctl/kernel.rst >> @@ -1182,7 +1182,8 @@ automatically on platforms where it can run >> (that is, >> platforms with asymmetric CPU topologies and having an Energy >> Model available). If your platform happens to meet the >> requirements for EAS but you do not want to use it, change >> -this value to 0. >> +this value to 0. On Non-EAS platforms, write operation fails and >> +read doesn't return anything. >> >> task_delayacct >> =============== >> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c >> index e0b9920e7e3e..a654d0186ac0 100644 >> --- a/kernel/sched/topology.c >> +++ b/kernel/sched/topology.c >> @@ -212,6 +212,70 @@ static unsigned int sysctl_sched_energy_aware = 1; >> static DEFINE_MUTEX(sched_energy_mutex); >> static bool sched_energy_update; >> >> +extern struct cpufreq_governor schedutil_gov; >> +static bool sched_is_eas_possible(const struct cpumask *cpu_mask) >> +{ >> + bool any_asym_capacity = false; >> + struct cpufreq_policy *policy; >> + struct cpufreq_governor *gov; >> + int i; >> + >> + /* EAS is enabled for asymmetric CPU capacity topologies. */ >> + for_each_cpu(i, cpu_mask) { >> + if (per_cpu(sd_asym_cpucapacity, i)) { >> + any_asym_capacity = true; >> + break; >> + } >> + } >> + if (!any_asym_capacity) { >> + if (sched_debug()) { >> + pr_info("rd %*pbl: Checking EAS, CPUs do not have >> asymmetric capacities\n", >> + cpumask_pr_args(cpu_mask)); >> + } >> + return false; >> + } >> + >> + /* EAS definitely does *not* handle SMT */ >> + if (sched_smt_active()) { >> + if (sched_debug()) { >> + pr_info("rd %*pbl: Checking EAS, SMT is not supported\n", >> + cpumask_pr_args(cpu_mask)); >> + } >> + return false; >> + } >> + >> + if (!arch_scale_freq_invariant()) { >> + if (sched_debug()) { >> + pr_info("rd %*pbl: Checking EAS: frequency-invariant load >> tracking not yet supported", >> + cpumask_pr_args(cpu_mask)); >> + } >> + return false; >> + } >> + >> + /* Do not attempt EAS if schedutil is not being used. */ >> + for_each_cpu(i, cpu_mask) { >> + policy = cpufreq_cpu_get(i); >> + if (!policy) { >> + if (sched_debug()) { >> + pr_info("rd %*pbl: Checking EAS, cpufreq policy not >> set for CPU: %d", >> + cpumask_pr_args(cpu_mask), i); >> + } >> + return false; >> + } >> + gov = policy->governor; >> + cpufreq_cpu_put(policy); >> + if (gov != &schedutil_gov) { >> + if (sched_debug()) { >> + pr_info("rd %*pbl: Checking EAS, schedutil is >> mandatory\n", >> + cpumask_pr_args(cpu_mask)); >> + } >> + return false; >> + } >> + } >> + >> + return true; >> +} >> + >> void rebuild_sched_domains_energy(void) >> { >> mutex_lock(&sched_energy_mutex); >> @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct >> ctl_table *table, int write, >> return -EPERM; >> >> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); >> + if (!sched_is_eas_possible(cpu_active_mask)) { >> + if (write) { >> + return -EOPNOTSUPP; >> + } else { >> + *lenp = 0; >> + return 0; >> + } >> + } >> + >> if (!ret && write) { >> state = static_branch_unlikely(&sched_energy_present); >> if (state != sysctl_sched_energy_aware) >> @@ -351,61 +424,24 @@ static void sched_energy_set(bool has_eas) >> * 4. schedutil is driving the frequency of all CPUs of the rd; >> * 5. frequency invariance support is present; >> */ >> -extern struct cpufreq_governor schedutil_gov; >> static bool build_perf_domains(const struct cpumask *cpu_map) >> { >> int i; >> struct perf_domain *pd = NULL, *tmp; >> int cpu = cpumask_first(cpu_map); >> struct root_domain *rd = cpu_rq(cpu)->rd; >> - struct cpufreq_policy *policy; >> - struct cpufreq_governor *gov; >> >> if (!sysctl_sched_energy_aware) >> goto free; >> >> - /* EAS is enabled for asymmetric CPU capacity topologies. */ >> - if (!per_cpu(sd_asym_cpucapacity, cpu)) { >> - if (sched_debug()) { >> - pr_info("rd %*pbl: CPUs do not have asymmetric >> capacities\n", >> - cpumask_pr_args(cpu_map)); >> - } >> - goto free; >> - } >> - >> - /* EAS definitely does *not* handle SMT */ >> - if (sched_smt_active()) { >> - pr_warn("rd %*pbl: Disabling EAS, SMT is not supported\n", >> - cpumask_pr_args(cpu_map)); >> - goto free; >> - } >> - >> - if (!arch_scale_freq_invariant()) { >> - if (sched_debug()) { >> - pr_warn("rd %*pbl: Disabling EAS: frequency-invariant >> load tracking not yet supported", >> - cpumask_pr_args(cpu_map)); >> - } >> + if (!sched_is_eas_possible(cpu_map)) >> goto free; >> - } >> >> for_each_cpu(i, cpu_map) { >> /* Skip already covered CPUs. */ >> if (find_pd(pd, i)) >> continue; >> >> - /* Do not attempt EAS if schedutil is not being used. */ >> - policy = cpufreq_cpu_get(i); >> - if (!policy) >> - goto free; >> - gov = policy->governor; >> - cpufreq_cpu_put(policy); >> - if (gov != &schedutil_gov) { >> - if (rd->pd) >> - pr_warn("rd %*pbl: Disabling EAS, schedutil is >> mandatory\n", >> - cpumask_pr_args(cpu_map)); >> - goto free; >> - } >> - >> /* Create the new pd and add it to the local list. */ >> tmp = pd_init(i); >> if (!tmp) >> -- >> 2.39.3 >> will send out v6 with these changes to changelog and Tested-by tag. will wait for a while to see if there are any concerns or comments.