Re: [RFC PATCH 12/20] kthread: Implement preferred affinity

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 5 Aug 2024 16:53:51 +0200

On 8/5/24 16:28, Frederic Weisbecker wrote:
> Le Tue, Jul 30, 2024 at 05:49:51PM +0200, Vlastimil Babka a écrit :
>> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
>> 
>> Nit:
>> 
>> > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
>> > +{
>> > +	struct kthread *kthread = to_kthread(p);
>> > +	cpumask_var_t affinity;
>> > +	unsigned long flags;
>> > +	int ret;
>> > +
>> > +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
>> > +		WARN_ON(1);
>> > +		return -EINVAL;
>> > +	}
>> > +
>> 
>> Should we also fail if kthread->preferred_affinity already exist? In
>> case somebody calls this twice.
> 
> Good point!
> 
>> 
>> Also for some of the use cases (kswapd, kcompactd) it would make sense
>> to be able to add cpus of a node as they are onlined. Which seems we
>> didn't do, except some corner case handling in kcompactd, but maybe we
>> should? I wonder if the current implementation of onlining a completely
>> new node with cpus does the right thing as a result of the individual
>> onlining operations, or we end up with being affined to a single cpu (or
>> none).
>> 
>> But that would need some kind of kthread_affine_preferred_update()
>> implementation?
> 
> So you mean that the "for_each_node_state()" loop in kcompactd doesn't
> handle all possible nodes but only those online when it's called? Or
> am I confused?

If you mean the loop in kcompactd_init() then indeed, but we also have a
hook in online_pages() to start new threads on newly onlined nodes, so
that's not a problem.

The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot
of cpus running on the NUMA node the time, and is never updated later as new
cpus might be brought up.

kcompactd_cpu_online() does try to update that when cpus are onlined (in a
clumsy way), there was nothing like that for kswapd and after your series
this update is also removed for kcompactd.

> If all users of preferred affinity were to use NUMA nodes, it could be
> a good idea to do a flavour of kernel/smpboot.c which would handle
> per-node kthreads instead of per-cpu kthreads. I initially thought
> about that. It would have handled all the lifecycle of those kthreads,
> including creation, against hotplug. Unfortunately RCU doesn't rely on
> per-NUMA nodes but rather use its own tree.
> 
> If there be more users of real per NUMA nodes kthreads than kswapd and
> kcompactd, of course that would be much worth considering.

Yeah it's not that compelling, but a way to update the preferred affine mask
in response to cpu hotplug events, that kswapd and kcompactd could use,
would be sufficient. And maybe more widely useful.

I guess there could be a callback defined for kthread to provide a new
preferred_affinity, that you'd call from kthreads_hotplug_update() ?
And kcompactd and kswapd could both use the same callback that interprets
kthread_data() as pgdat and fetches a new cpumask of it?

> Thanks.