On 8/5/24 16:28, Frederic Weisbecker wrote: > Le Tue, Jul 30, 2024 at 05:49:51PM +0200, Vlastimil Babka a écrit : >> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> >> >> Nit: >> >> > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask) >> > +{ >> > + struct kthread *kthread = to_kthread(p); >> > + cpumask_var_t affinity; >> > + unsigned long flags; >> > + int ret; >> > + >> > + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { >> > + WARN_ON(1); >> > + return -EINVAL; >> > + } >> > + >> >> Should we also fail if kthread->preferred_affinity already exist? In >> case somebody calls this twice. > > Good point! > >> >> Also for some of the use cases (kswapd, kcompactd) it would make sense >> to be able to add cpus of a node as they are onlined. Which seems we >> didn't do, except some corner case handling in kcompactd, but maybe we >> should? I wonder if the current implementation of onlining a completely >> new node with cpus does the right thing as a result of the individual >> onlining operations, or we end up with being affined to a single cpu (or >> none). >> >> But that would need some kind of kthread_affine_preferred_update() >> implementation? > > So you mean that the "for_each_node_state()" loop in kcompactd doesn't > handle all possible nodes but only those online when it's called? Or > am I confused? If you mean the loop in kcompactd_init() then indeed, but we also have a hook in online_pages() to start new threads on newly onlined nodes, so that's not a problem. The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot of cpus running on the NUMA node the time, and is never updated later as new cpus might be brought up. kcompactd_cpu_online() does try to update that when cpus are onlined (in a clumsy way), there was nothing like that for kswapd and after your series this update is also removed for kcompactd. > If all users of preferred affinity were to use NUMA nodes, it could be > a good idea to do a flavour of kernel/smpboot.c which would handle > per-node kthreads instead of per-cpu kthreads. I initially thought > about that. It would have handled all the lifecycle of those kthreads, > including creation, against hotplug. Unfortunately RCU doesn't rely on > per-NUMA nodes but rather use its own tree. > > If there be more users of real per NUMA nodes kthreads than kswapd and > kcompactd, of course that would be much worth considering. Yeah it's not that compelling, but a way to update the preferred affine mask in response to cpu hotplug events, that kswapd and kcompactd could use, would be sufficient. And maybe more widely useful. I guess there could be a callback defined for kthread to provide a new preferred_affinity, that you'd call from kthreads_hotplug_update() ? And kcompactd and kswapd could both use the same callback that interprets kthread_data() as pgdat and fetches a new cpumask of it? > Thanks.