On 7/26/24 11:56 PM, Frederic Weisbecker wrote: > Affining kthreads follow either of three existing different patterns: > > 1) Per-CPU kthreads must stay affine to a single CPU and never execute > relevant code on any other CPU. This is currently handled by smpboot > code which takes care of CPU-hotplug operations. > > 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't > run anywhere else. The affinity is set through kthread_bind_mask() > and the subsystem takes care by itself to handle CPU-hotplug operations. > > 3) Kthreads that have a _preferred_ affinity but that can run anywhere > without breaking correctness. Userspace can overwrite the affinity. > It is set manually like any other task and CPU-hotplug is supposed > to be handled by the relevant subsystem so that the task is properly > reaffined whenever a given CPU from the preferred affinity comes up > or down. Also care must be taken so that the preferred affinity > doesn't cross housekeeping cpumask boundaries. > > Currently the preferred affinity pattern has at least 4 identified > users, with more or less success when it comes to handle CPU-hotplug > operations and housekeeping cpumask. > > Provide an infrastructure to handle this usecase patter. A new > kthread_affine_preferred() API is introduced, to be used just like > kthread_bind_mask(), right after kthread creation and before the first > wake up. The kthread is then affine right away to the cpumask passed > through the API if it has online housekeeping CPUs. Otherwise it will > be affine to all online housekeeping CPUs as a last resort. > > It is aware of CPU hotplug events such that: > > * When a housekeeping CPU goes up and is part of the preferred affinity > of a given kthread, it is added to its applied affinity set (and > possibly the default last resort online housekeeping set is removed > from the set). > > * When a housekeeping CPU goes down while it was part of the preferred > affinity of a kthread, it is removed from the kthread's applied > affinity. The last resort is to affine the kthread to all online > housekeeping CPUs. > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Nit: > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask) > +{ > + struct kthread *kthread = to_kthread(p); > + cpumask_var_t affinity; > + unsigned long flags; > + int ret; > + > + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { > + WARN_ON(1); > + return -EINVAL; > + } > + Should we also fail if kthread->preferred_affinity already exist? In case somebody calls this twice. Also for some of the use cases (kswapd, kcompactd) it would make sense to be able to add cpus of a node as they are onlined. Which seems we didn't do, except some corner case handling in kcompactd, but maybe we should? I wonder if the current implementation of onlining a completely new node with cpus does the right thing as a result of the individual onlining operations, or we end up with being affined to a single cpu (or none). But that would need some kind of kthread_affine_preferred_update() implementation? > + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) > + return -ENOMEM; > + > + kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL); > + if (!kthread->preferred_affinity) { > + ret = -ENOMEM; > + goto out; > + } > + > + mutex_lock(&kthreads_hotplug_lock); > + cpumask_copy(kthread->preferred_affinity, mask); > + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); > + kthread_fetch_affinity(kthread, affinity); > + > + /* It's safe because the task is inactive. */ > + raw_spin_lock_irqsave(&p->pi_lock, flags); > + do_set_cpus_allowed(p, mask); > + raw_spin_unlock_irqrestore(&p->pi_lock, flags); > + > + mutex_unlock(&kthreads_hotplug_lock); > +out: > + free_cpumask_var(affinity); > + > + return 0; > +} > + > +static int kthreads_hotplug_update(void) > +{ > + cpumask_var_t affinity; > + struct kthread *k; > + int err = 0; > + > + if (list_empty(&kthreads_hotplug)) > + return 0; > + > + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) > + return -ENOMEM; > + > + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { > + if (WARN_ON_ONCE(!k->preferred_affinity)) { > + err = -EINVAL; > + break; > + } > + kthread_fetch_affinity(k, affinity); > + set_cpus_allowed_ptr(k->task, affinity); > + } > + > + free_cpumask_var(affinity); > + > + return err; > +} > + > +static int kthreads_offline_cpu(unsigned int cpu) > +{ > + int ret = 0; > + > + mutex_lock(&kthreads_hotplug_lock); > + cpumask_clear_cpu(cpu, &kthread_online_mask); > + ret = kthreads_hotplug_update(); > + mutex_unlock(&kthreads_hotplug_lock); > + > + return ret; > +} > + > +static int kthreads_online_cpu(unsigned int cpu) > +{ > + int ret = 0; > + > + mutex_lock(&kthreads_hotplug_lock); > + cpumask_set_cpu(cpu, &kthread_online_mask); > + ret = kthreads_hotplug_update(); > + mutex_unlock(&kthreads_hotplug_lock); > + > + return ret; > +} > + > +static int kthreads_init(void) > +{ > + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", > + kthreads_online_cpu, kthreads_offline_cpu); > +} > +early_initcall(kthreads_init); > + > void __kthread_init_worker(struct kthread_worker *worker, > const char *name, > struct lock_class_key *key)