On 6/6/23 15:58, Tejun Heo wrote:
Hello, Waiman.
On Mon, Jun 05, 2023 at 10:47:08PM -0400, Waiman Long wrote:
...
I had a different idea on the semantics of the cpuset.cpus.exclusive at the
beginning. My original thinking is that it was the actual exclusive CPUs
that are allocated to the cgroup. Now if we treat this as a hint of what
exclusive CPUs should be used and it becomes valid only if the cgroup can
I wouldn't call it a hint. It's still hard allocation of the CPUs to the
cgroups that own them. Setting up a partition requires exclusive CPUs and
thus would depend on exclusive allocations set up accordingly.
become a valid partition. I can see it as a value that can be hierarchically
set throughout the whole cpuset hierarchy.
So a transition to a valid partition is possible iff
1) cpuset.cpus.exclusive is a subset of cpuset.cpus and is a subset of
cpuset.cpus.exclusive of all its ancestors.
Yes.
2) If its parent is not a partition root, none of the CPUs in
cpuset.cpus.exclusive are currently allocated to other partitions. This the
Not just that, the CPUs aren't available to cgroups which don't have them
set in the .exclusive file. IOW, if a CPU is in cpus.exclusive of some
cgroups, it shouldn't appear in cpus.effective of cgroups which don't have
the CPU in their cpus.exclusive.
So, .exclusive explicitly establishes exclusive ownership of CPUs and
partitions depend on that with an implicit "turn CPUs exclusive" behavior in
case the parent is a partition root for backward compatibility.
The current CPU exclusive behavior is limited to sibling cgroups only.
Because of the hierarchical nature of cpu distribution, the set of
exclusive CPUs have to appear in all its ancestors. When partition is
enabled, we do a sibling exclusivity test at that point to verify that
it is exclusive. It looks like you want to do an exclusivity test even
when the partition isn't active. I can certainly do that when the file
is being updated. However, it will fail the write if the exclusivity
test fails just like the v1 cpuset.cpus.exclusive flag if you are OK
with that.
same remote partition concept in my v2 patch. If its parent is a partition
root, part of its exclusive CPUs will be distributed to this child partition
like the current behavior of cpuset partition.
Yes, similar in a sense. Please do away with the "once .reserve is used, the
behavior is switched" part.
That behavior has been gone in my v2 patch.
Instead, it can be sth like "if the parent is a
partition root, cpuset implicitly tries to set all CPUs in its cpus file in
its cpus.exclusive file" so that user-visible behavior stays unchanged
depending on past history.
If parent is a partition root, auto reservation will be done and
cpus.exclusive will be set automatically just like before. So existing
applications using partition will not be affected.
Cheers,
Longman