On 11/15/24 11:30 AM, Juri Lelli wrote:
Hello,
While working on the recent cpuset/deadline fixes [1], I encountered
what looks like an issue to me. What I'm doing is (based on one of the
tests of test_cpuset_prs.sh):
# echo Y >/sys/kernel/debug/sched/verbose
# echo +cpuset >cgroup/cgroup.subtree_control
# mkdir cgroup/A1
# echo 0-3 >cgroup/A1/cpuset.cpus
# echo +cpuset >cgroup/A1/cgroup.subtree_control
# mkdir cgroup/A1/A2
# echo 1-3 >cgroup/A1/A2/cpuset.cpus
# echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
# mkdir cgroup/A1/A2/A3
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
# echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
# echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
and with this, on my 8 CPUs system, I correctly get a root domain for
0-1,4-7 and 2,3 are left isolated (attached to default root domain).
I now put the shell into the A1/A2/A3 cpuset
# echo $$ >cgroup/A1/A2/A3/cgroup.procs
and hotplug CPU 2,3
# echo 0 >/sys/devices/system/cpu/cpu2/online
# echo 0 >/sys/devices/system/cpu/cpu3/online
guess the shell is moved to the non-isolated domain. So far so good
then, only that if I turn CPUs 2,3 back on they are attached to the root
domain containing the non-isolated cpus
A valid partition must have CPUs associated with it. If no CPU is
available, it becomes invalid and fall back to use the CPUs from the
parent cgroup.
# echo 1 >/sys/devices/system/cpu/cpu2/online
...
[ 990.133593] root domain span: 0-2,4-7
[ 990.134480] rd 0-2,4-7
# echo 1 >/sys/devices/system/cpu/cpu3/online
...
[ 1082.858992] root domain span: 0-7
[ 1082.859530] rd 0-7
And now the A1/A2/A3 partition is not valid anymore
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
Is this expected? It looks like one need to put at least one process in
the partition before hotplugging its cpus for the above to reproduce
(hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
and isolated).
Once a partition becomes invalid, there is no self recovery if the CPUs
become online again. Users have to explicitly re-enable it. It is really
a very rare case and so we don't spend effort to do that.
If only one of 2 CPUs are offline and then online again, the full 2-CPU
isolated partition can be recovered.
Please let me know if you have further question.
Cheers,
Longman