On 3/14/23 12:25, Michal Koutný wrote:
Hello.
On Tue, Mar 14, 2023 at 10:07:40AM +0000, Daniel Dao <dqminh@xxxxxxxxxxxxxx> wrote:
IMO this violated the principle of cpuset and can be confusing for end users.
I think I prefer Waiman's suggestion of allowing an implicit move to cpuset
when enabling cpuset with subtree_control but not explicit moves such as when
setting cpuset.cpus or writing the pids into cgroup.procs. It's easier to reason
about and make the failure mode more explicit.
What do you think ?
I think cpuset should top IO worker's affinity (like sched_setaffinity(2)).
Thus:
- modifying cpuset.cpus update task's affinity, for sure
- implicit migration (enabling cpuset) update task's affinity, effective nop
Note that since commit 7fd4da9c158 ("cgroup/cpuset: Optimize
cpuset_attach() on v2") in v6.2, implicit migration (enabling cpuset)
will not affect the cpu affinity of the process.
- explicit migration (meh) update task's affinity, ¯\_(ツ)_/¯
My understanding of PF_NO_SETAFFINITY is that's for kernel threads that
do work that's functionally needed on a given CPU and thus they cannot
be migrated [1]. As said previously for io_uring workers, affinity is
for performance only.
Hence, I'd also suggest on top of 01e68ce08a30 ("io_uring/io-wq: stop
setting PF_NO_SETAFFINITY on io-wq workers"):
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -233,7 +233,6 @@ static int io_sq_thread(void *data)
set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
else
set_cpus_allowed_ptr(current, cpu_online_mask);
- current->flags |= PF_NO_SETAFFINITY;
mutex_lock(&sqd->lock);
while (1) {
Afterall, io_uring_setup(2) already mentions:
When cgroup setting cpuset.cpus changes (typically in container
environment), the bounded cpu set may be changed as well.
Using sched_setaffiinity(2) can be another alternative. Starting from
v6.2, cpu affinity set by sched_affiinity(2) will be more or less
maintained and constrained by the current cpuset even if the cpu list is
being changed as long as there is overlap between the two. The
intersection between cpu affinity set by sched_setaffinity(2) and the
effective_cpus in cpuset will be the effective cpu affinity of the task.
Cheers,
Longman