On Tue 17-09-24 09:01:08, Vlastimil Babka wrote: > On 9/17/24 8:26 AM, Michal Hocko wrote: > > On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote: > >> Kthreads attached to a preferred NUMA node for their task structure > >> allocation can also be assumed to run preferrably within that same node. > >> > >> A more precise affinity is usually notified by calling > >> kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. > >> > >> For the others, a default affinity to the node is desired and sometimes > >> implemented with more or less success when it comes to deal with hotplug > >> events and nohz_full / CPU Isolation interactions: > >> > >> - kcompactd is affine to its node and handles hotplug but not CPU Isolation > >> - kswapd is affine to its node and ignores hotplug and CPU Isolation > >> - A bunch of drivers create their kthreads on a specific node and > >> don't take care about affining further. > >> > >> Handle that default node affinity preference at the generic level > >> instead, provided a kthread is created on an actual node and doesn't > >> apply any specific affinity such as a given CPU or a custom cpumask to > >> bind to before its first wake-up. > > > > Makes sense. > > > >> This generic handling is aware of CPU hotplug events and CPU isolation > >> such that: > >> > >> * When a housekeeping CPU goes up and is part of the node of a given > >> kthread, it is added to its applied affinity set (and > >> possibly the default last resort online housekeeping set is removed > >> from the set). > >> > >> * When a housekeeping CPU goes down while it was part of the node of a > >> kthread, it is removed from the kthread's applied > >> affinity. The last resort is to affine the kthread to all online > >> housekeeping CPUs. > > > > But I am not really sure about this part. Sure it makes sense to set the > > affinity to exclude isolated CPUs but why do we care about hotplug > > events at all. Let's say we offline all cpus from a given node (or > > that all but isolated cpus are offline - is this even > > realistic/reasonable usecase?). Wouldn't scheduler ignore the kthread's > > affinity in such a case? In other words how is that different from > > tasksetting an userspace task to a cpu that goes offline? We still do > > allow such a task to run, right? We just do not care about affinity > > anymore. > > AFAIU it handles better the situation where all houskeeping cpus from > the preferred node go down, then it affines to houskeeping cpus from any > node vs any cpu including isolated ones. Doesn't that happen automagically? Or can it end up on a random isolated cpu? -- Michal Hocko SUSE Labs