On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote: > After taking a close look at the patch, my understanding of what it is doing > is as follows: > > v2: cpus_allowed will not be affected by hotplug. So the new > cpuset_cpus_allowed() will return effective_cpus + offline cpus that should > have been part of effective_cpus if online before masking it with allowable > cpus and then go up the cpuset hierarchy if necessary. > > v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the > current cpuset and move up the hierarchy if necessary to find a cpuset that > have at least one allowable cpu. > > First of all, it does not take into account of the v2 partition feature that > may cause it to produce incorrect result if partition is enabled somewhere. How so? For a partition the cpus_allowed mask should be the parition CPUs. The only magical bit about partitions is that any one CPU cannot belong to two partitions and load-balancing is split. > Secondly, I don't see any benefit other than having some additional offline > cpu available in a task's cpumask which the scheduler will ignore anyway. Those CPUs can come online again -- you're *again* dismissing the true bug :/ If you filter out the offline CPUs at sched_setaffinity() time, you forever lose those CPUs, the task will never again move to those CPUs, even if they do come online after. It is really simple to reproduce this: - boot machine - offline all CPUs except one - taskset -p ffffffff $$ - online all CPUs and observe your shell (and all its decendants) being stuck to the one CPU. Do the same thing on a CPUSET=n build and note the difference (you retain the full mask). > v2 is able to recover a previously offlined cpu. So we don't gain any > net benefit other than the going up the cpuset hierarchy part. Only for !root tasks. Not even v2 will re-set the affinity of root tasks afaict. > For v1, I agree we should go up the cpuset hierarchy to find a usable > cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), > my current preference is to do the hierarchy climbing part in an enhanced > cpuset_cpus_allowed_fallback() after an initial failure of > cpuset_cpus_allowed(). That will be easier to understand than having such > complexity and overhead in cpuset_cpus_allowed() alone. > > I will work on a patchset to do that as a counter offer. We will need a small and simple patch for /urgent, or I will need to revert all your patches -- your call. I also don't tihnk you fully appreciate the ramifications of task_cpu_possible_mask(), cpuset currently gets that quite wrong.