Re: [PATCH v5 2/3] sched/topology: Rework CPU capacity asymmetry detection

Valentin Schneider <valentin.schneider@xxxxxxx> · Mon, 24 May 2021 19:01:04 +0100

Hi Beata,

On 24/05/21 11:16, Beata Michalska wrote:
> Currently the CPU capacity asymmetry detection, performed through
> asym_cpu_capacity_level, tries to identify the lowest topology level
> at which the highest CPU capacity is being observed, not necessarily
> finding the level at which all possible capacity values are visible
> to all CPUs, which might be bit problematic for some possible/valid
> asymmetric topologies i.e.:
>
> DIE      [                                ]
> MC       [                       ][       ]
>
> CPU       [0] [1] [2] [3] [4] [5]  [6] [7]
> Capacity  |.....| |.....| |.....|  |.....|
>            L	     M       B        B
>
> Where:
>  arch_scale_cpu_capacity(L) = 512
>  arch_scale_cpu_capacity(M) = 871
>  arch_scale_cpu_capacity(B) = 1024
>
> In this particular case, the asymmetric topology level will point
> at MC, as all possible CPU masks for that level do cover the CPU
> with the highest capacity. It will work just fine for the first
> cluster, not so much for the second one though (consider the
> find_energy_efficient_cpu which might end up attempting the energy
> aware wake-up for a domain that does not see any asymmetry at all)
>
> Rework the way the capacity asymmetry levels are being detected,
> allowing to point to the lowest topology level (for a given CPU), where
> full set of available CPU capacities is visible to all CPUs within given
> domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> the domains. This will have an impact on EAS wake-up placement in a way
> that it might see different rage of CPUs to be considered, depending on
> the given current and target CPUs.
>
> Additionally, those levels, where any range of asymmetry (not
> necessarily full) is being detected will get identified as well.
> The selected asymmetric topology level will be denoted by
> SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
> would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
> maintaining the current behaviour for asymmetric topologies, with
> misfit migration operating correctly on lower levels, if applicable,
> as any asymmetry is enough to trigger the misfit migration.
> The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
> relate to the full asymmetry level denoted by the sd_asym_cpucapacity
> pointer.
>
> Detecting the CPU capacity asymmetry is being based on a set of
> available CPU capacities for all possible CPUs. This data is being
> generated upon init and updated once CPU topology changes are being
> detected (through arch_update_cpu_topology). As such, any changes
> to identified CPU capacities (like initializing cpufreq) need to be
> explicitly advertised by corresponding archs to trigger rebuilding
> the data.
>
> This patch also removes the additional -dflags- parameter used when
  ^^^^^^^^^^^^^^^^^^^^^^^
s/^/Also remove/

> building sched domains as the asymmetry flags are now being set
> directly in sd_init.
>

Few nits below, but beyond that:

Tested-by: Valentin Schneider <valentin.schneider@xxxxxxx>
Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>

> +static inline int
> +asym_cpu_capacity_classify(struct sched_domain *sd,
> +			   const struct cpumask *cpu_map)
> +{
> +	int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> +	struct asym_cap_data *entry;
> +	int asym_cap_count = 0;
> +
> +	if (list_is_singular(&asym_cap_list))
> +		goto leave;
> +
> +	list_for_each_entry(entry, &asym_cap_list, link) {
> +		if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> +			++asym_cap_count;
> +		} else {
> +			/*
> +			 * CPUs with given capacity might be offline
> +			 * so make sure this is not the case
> +			 */
> +			if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> +				sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> +				if (asym_cap_count > 1)
> +					break;
> +			}

Readability nit: That could be made into an else if ().

> +		}
> +	}
> +	WARN_ON_ONCE(!asym_cap_count);
> +leave:
> +	return asym_cap_count > 1 ? sd_asym_flags : 0;
> +}
> +

> +static void asym_cpu_capacity_scan(void)
> +{
> +	struct asym_cap_data *entry, *next;
> +	int cpu;
> +
> +	list_for_each_entry(entry, &asym_cap_list, link)
> +		cpumask_clear(entry->cpu_mask);
> +
> +	entry = list_first_entry_or_null(&asym_cap_list,
> +					 struct asym_cap_data, link);
> +
> +	for_each_cpu_and(cpu, cpu_possible_mask,
> +			 housekeeping_cpumask(HK_FLAG_DOMAIN)) {
> +		unsigned long capacity = arch_scale_cpu_capacity(cpu);
> +
> +		if (!entry || capacity != entry->capacity)
> +			entry = asym_cpu_capacity_get_data(capacity);
> +		if (entry)
> +			__cpumask_set_cpu(cpu, entry->cpu_mask);

That 'if' is only there in case the alloc within the helper failed, which
is a bit of a shame.

You could pass the CPU to that helper function and have it set the right
bit, or you could even forgo the capacity != entry->capacity check here and
let the helper function do it all.

Yes, that means more asym_cap_list iterations, but that's
O(nr_cpus * nr_caps); a topology rebuild is along the lines of
O(nr_cpus² * nr_topology_levels), so not such a big deal comparatively.

> +	}
> +
> +	list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> +		if (cpumask_empty(entry->cpu_mask)) {
> +			list_del(&entry->link);
> +			kfree(entry);
> +		}
> +	}
> +}
> +