On 5/5/23 12:03, Tejun Heo wrote:
On Wed, May 03, 2023 at 11:01:36PM -0400, Waiman Long wrote:
On 5/2/23 18:27, Michal Koutný wrote:
On Tue, May 02, 2023 at 05:26:17PM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:
In the new scheme, the available cpus are still directly passed down to a
descendant cgroup. However, isolated CPUs (or more generally CPUs dedicated
to a partition) have to be exclusive. So what the cpuset.cpus.reserve does
is to identify those exclusive CPUs that can be excluded from the
effective_cpus of the parent cgroups before they are claimed by a child
partition. Currently this is done automatically when a child partition is
created off a parent partition root. The new scheme will break it into 2
separate steps without the requirement that the parent of a partition has to
be a partition root itself.
new scheme
1st step:
echo C >p/cpuset.cpus.reserve
# p/cpuset.cpus.effective == A-C (1)
2nd step (claim):
echo C' >p/c/cpuset.cpus # C'⊆C
echo root >p/c/cpuset.cpus.partition
It is something like that. However, the current scheme of automatic
reservation is also supported, i.e. cpuset.cpus.reserve will be set
automatically when the child cgroup becomes a valid partition as long as the
cpuset.cpus.reserve file is not written to. This is for backward
compatibility.
Once it is written to, automatic mode will end and users have to manually
set it afterward.
I really don't like the implicit switching behavior. This is interface
behavior modifying internal state that userspace can't view or control
directly. Regardless of how the rest of the discussion develops, this part
should be improved (e.g. would it work to always try to auto-reserve if the
cpu isn't already reserved?).
After some more thought yesterday, I have a slight change in my design
that auto-reserve as it is now will stay for partitions that have a
partition root parent. For remote partition that doesn't have a
partition root parent, its creation will require pre-allocating
additional CPUs into top_cpuset's cpuset.cpus.reserve first. So there
will be no change in behavior for existing use cases whether a remote
partition is created or not.
Cheers,
Longman