Re: [PATCH] kthread_worker: re-set CPU affinities if CPU come online

Tejun Heo <tj@xxxxxxxxxx> · Mon, 26 Oct 2020 12:53:11 -0400

Hello, Petr.

On Mon, Oct 26, 2020 at 05:45:55PM +0100, Petr Mladek wrote:
> > I don't think this works. The kthread may have changed its binding while
> > running using set_cpus_allowed_ptr() as you're doing above. Besides, when a
> > cpu goes offline, the bound kthread can fall back to other cpus but its cpu
> > mask isn't cleared, is it?
> 
> If I get it correctly, select_fallback_rq() calls
> do_set_cpus_allowed() explicitly or in cpuset_cpus_allowed_fallback().
> It seems that the original mask gets lost.

Oh, I see.

> It would make sense to assume that kthread_worker API will take care of
> the affinity when it was set by kthread_create_worker_on_cpu().

I was for some reason thinking this was for all kthreads. Yeah, for
kthread_workers it does make sense.

> But is it safe to assume that the work can be safely proceed also
> on another CPU? We should probably add a warning into
> kthread_worker_fn() when it detects wrong CPU.

Per-cpu workqueues behave like that too. When the CPU goes down, per-cpu
workers on that CPU are unbound and may run anywhere. They get rebound when
CPU comes back up.

> BTW: kthread_create_worker_on_cpu() is currently used only by
>      start_power_clamp_worker(). And it has its own CPU hotplug
>      handling. The kthreads are stopped and started again
>      in powerclamp_cpu_predown() and  powerclamp_cpu_online().

And users which have hard dependency on CPU binding are expected to
implement hotplug events so that e.g. per-cpu work items are flushed when
CPU goes down and scheduled back when it comes back online.

There are pros and cons to the current workqueue behavior but it'd be a good
idea to keep kthread_worker's behavior in sync.

> I havn't checked all details yet. But in principle, the patch looks
> sane to me.

Yeah, agreed.

Thanks.

-- 
tejun