On 9/17/24 9:05 AM, Michal Hocko wrote: > On Tue 17-09-24 09:01:08, Vlastimil Babka wrote: >> On 9/17/24 8:26 AM, Michal Hocko wrote: >>> On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote: >>>> Kthreads attached to a preferred NUMA node for their task structure >>>> allocation can also be assumed to run preferrably within that same node. >>>> >>>> A more precise affinity is usually notified by calling >>>> kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. >>>> >>>> For the others, a default affinity to the node is desired and sometimes >>>> implemented with more or less success when it comes to deal with hotplug >>>> events and nohz_full / CPU Isolation interactions: >>>> >>>> - kcompactd is affine to its node and handles hotplug but not CPU Isolation >>>> - kswapd is affine to its node and ignores hotplug and CPU Isolation >>>> - A bunch of drivers create their kthreads on a specific node and >>>> don't take care about affining further. >>>> >>>> Handle that default node affinity preference at the generic level >>>> instead, provided a kthread is created on an actual node and doesn't >>>> apply any specific affinity such as a given CPU or a custom cpumask to >>>> bind to before its first wake-up. >>> >>> Makes sense. >>> >>>> This generic handling is aware of CPU hotplug events and CPU isolation >>>> such that: >>>> >>>> * When a housekeeping CPU goes up and is part of the node of a given >>>> kthread, it is added to its applied affinity set (and >>>> possibly the default last resort online housekeeping set is removed >>>> from the set). >>>> >>>> * When a housekeeping CPU goes down while it was part of the node of a >>>> kthread, it is removed from the kthread's applied >>>> affinity. The last resort is to affine the kthread to all online >>>> housekeeping CPUs. >>> >>> But I am not really sure about this part. Sure it makes sense to set the >>> affinity to exclude isolated CPUs but why do we care about hotplug >>> events at all. Let's say we offline all cpus from a given node (or >>> that all but isolated cpus are offline - is this even >>> realistic/reasonable usecase?). Wouldn't scheduler ignore the kthread's >>> affinity in such a case? In other words how is that different from >>> tasksetting an userspace task to a cpu that goes offline? We still do >>> allow such a task to run, right? We just do not care about affinity >>> anymore. >> >> AFAIU it handles better the situation where all houskeeping cpus from >> the preferred node go down, then it affines to houskeeping cpus from any >> node vs any cpu including isolated ones. > > Doesn't that happen automagically? Or can it end up on a random > isolated cpu? Good question, perhaps it can and there's no automagic, as I see code like: + /* Make sure the kthread never gets re-affined globally */ + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));