Re: [RFC PATCH 12/20] kthread: Implement preferred affinity

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 30 Jul 2024 17:49:51 +0200

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> Affining kthreads follow either of three existing different patterns:
> 
> 1) Per-CPU kthreads must stay affine to a single CPU and never execute
>    relevant code on any other CPU. This is currently handled by smpboot
>    code which takes care of CPU-hotplug operations.
> 
> 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
>    run anywhere else. The affinity is set through kthread_bind_mask()
>    and the subsystem takes care by itself to handle CPU-hotplug operations.
> 
> 3) Kthreads that have a _preferred_ affinity but that can run anywhere
>    without breaking correctness. Userspace can overwrite the affinity.
>    It is set manually like any other task and CPU-hotplug is supposed
>    to be handled by the relevant subsystem so that the task is properly
>    reaffined whenever a given CPU from the preferred affinity comes up
>    or down. Also care must be taken so that the preferred affinity
>    doesn't cross housekeeping cpumask boundaries.
> 
> Currently the preferred affinity pattern has at least 4 identified
> users, with more or less success when it comes to handle CPU-hotplug
> operations and housekeeping cpumask.
> 
> Provide an infrastructure to handle this usecase patter. A new
> kthread_affine_preferred() API is introduced, to be used just like
> kthread_bind_mask(), right after kthread creation and before the first
> wake up. The kthread is then affine right away to the cpumask passed
> through the API if it has online housekeeping CPUs. Otherwise it will
> be affine to all online housekeeping CPUs as a last resort.
> 
> It is aware of CPU hotplug events such that:
> 
> * When a housekeeping CPU goes up and is part of the preferred affinity
>   of a given kthread, it is added to its applied affinity set (and
>   possibly the default last resort online housekeeping set is removed
>   from the set).
> 
> * When a housekeeping CPU goes down while it was part of the preferred
>   affinity of a kthread, it is removed from the kthread's applied
>   affinity. The last resort is to affine the kthread to all online
>   housekeeping CPUs.
> 
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>

Acked-by: Vlastimil Babka <vbabka@xxxxxxx>

Nit:

> +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
> +{
> +	struct kthread *kthread = to_kthread(p);
> +	cpumask_var_t affinity;
> +	unsigned long flags;
> +	int ret;
> +
> +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
> +		WARN_ON(1);
> +		return -EINVAL;
> +	}
> +

Should we also fail if kthread->preferred_affinity already exist? In
case somebody calls this twice.

Also for some of the use cases (kswapd, kcompactd) it would make sense
to be able to add cpus of a node as they are onlined. Which seems we
didn't do, except some corner case handling in kcompactd, but maybe we
should? I wonder if the current implementation of onlining a completely
new node with cpus does the right thing as a result of the individual
onlining operations, or we end up with being affined to a single cpu (or
none).

But that would need some kind of kthread_affine_preferred_update()
implementation?

> +	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
> +	if (!kthread->preferred_affinity) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_copy(kthread->preferred_affinity, mask);
> +	list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
> +	kthread_fetch_affinity(kthread, affinity);
> +
> +	/* It's safe because the task is inactive. */
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	do_set_cpus_allowed(p, mask);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +
> +	mutex_unlock(&kthreads_hotplug_lock);
> +out:
> +	free_cpumask_var(affinity);
> +
> +	return 0;
> +}
> +
> +static int kthreads_hotplug_update(void)
> +{
> +	cpumask_var_t affinity;
> +	struct kthread *k;
> +	int err = 0;
> +
> +	if (list_empty(&kthreads_hotplug))
> +		return 0;
> +
> +	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
> +		if (WARN_ON_ONCE(!k->preferred_affinity)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +		kthread_fetch_affinity(k, affinity);
> +		set_cpus_allowed_ptr(k->task, affinity);
> +	}
> +
> +	free_cpumask_var(affinity);
> +
> +	return err;
> +}
> +
> +static int kthreads_offline_cpu(unsigned int cpu)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_clear_cpu(cpu, &kthread_online_mask);
> +	ret = kthreads_hotplug_update();
> +	mutex_unlock(&kthreads_hotplug_lock);
> +
> +	return ret;
> +}
> +
> +static int kthreads_online_cpu(unsigned int cpu)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_set_cpu(cpu, &kthread_online_mask);
> +	ret = kthreads_hotplug_update();
> +	mutex_unlock(&kthreads_hotplug_lock);
> +
> +	return ret;
> +}
> +
> +static int kthreads_init(void)
> +{
> +	return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
> +				kthreads_online_cpu, kthreads_offline_cpu);
> +}
> +early_initcall(kthreads_init);
> +
>  void __kthread_init_worker(struct kthread_worker *worker,
>  				const char *name,
>  				struct lock_class_key *key)