Re: [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/15/21 14:31, Michal Koutný wrote:
Hello.

On Mon, Oct 18, 2021 at 10:36:18AM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:
+	When set to "isolated", the CPUs in that partition root will
+	be in an isolated state without any load balancing from the
+	scheduler.  Tasks in such a partition must be explicitly bound
+	to each individual CPU.
This sounds reasonable but it seems to have some usability issues as was
raised in another thread [1]. (I could only think of the workaround of
single-cpu cgroup leaves + CLONE_INTO_CGROUP.)

It can be a problem when one is trying to move from one cgroup to another cgroup with non-overlapping cpus laterally. However, if a task is initially from a parent cgroup with affinity mask that include cpus in the isolated child cgroup, I believe it should be able to move to the isolated child cgroup without problem. Otherwise, it is a bug that needs to be fixed.



TL;DR Do whatever you find suitable but (re)consider sticking to the
delegation principle (making hotplug and ancestor changes equal).

Now to the constraints and partition setups. I think it's useful to have
a model with which the implementation can be compared with.
I tried to condense some "simple rules" from the descriptions you posted
in v8 plus your response to my remarks in v7 [2]. These should only be
the "validity conditions", not "transition conditions".

## Validity conditions

For simplification, there's a condition called 'degraded' that tells
whether a cpuset can host tasks (with the given config) that expands to
two predicates:

	degraded := cpus.internal_effective == ø && has_tasks
	valid_root := !degraded && cpus_exclusive && parent.valid_root
	(valid_member := !degraded)

with a helping predicate
	cpus_exclusive := cpus not shared by a sibling

The effective CPUs basically combine configured+available CPUs

	cpus.internal_effective := (cpus ∩ parent.cpus ∩ online_cpus) - passed

where
	passed := union of children cpus whose partition is not member

Finally, to handle the degraded cpusets gracefully, we define

	if (!degraded)
		cpus.effective := cpus.internal_effective
	else
		cpus.effective := parent.cpus.effective

(In cases when there's no parent, we replace its cpus with online_cpus.)

---

I'll try applying these conditions to your description.

+
+	"cpuset.cpus" must always be set up first before enabling
+	partition.
This is just a transition condition.

       Unlike "member" whose "cpuset.cpus.effective" can
+	contain CPUs not in "cpuset.cpus", this can never happen with a
+	valid partition root. In other words, "cpuset.cpus.effective"
+	is always a subset of "cpuset.cpus" for a valid partition root.
IIUC this refers to the cgroup that is 'degraded'. (The consequences for
a valid partition root follow from valid_root definition above.)

+
+	When a parent partition root cannot exclusively grant any of
+	the CPUs specified in "cpuset.cpus", "cpuset.cpus.effective"
+	becomes empty.
This sounds too strict to me, perhaps you meant 'cannot grant _all_ of
the CPUs'?
I think the wording may be confusing. What I meant is none of the requested cpu can be granted. So if there is at least one granted, the effective cpus won't be empty.
       If there are tasks in the partition root, the
+	partition root becomes invalid and "cpuset.cpus.effective"
+	is reset to that of the nearest non-empty ancestor.
This is captured in the definition of 'degraded'.

+
+        Note that a task cannot be moved to a croup with empty
+        "cpuset.cpus.effective".
A transition condition. (Makes sense.)

[With the validity conditions above, it's possible to have 'valid_root'
with empty cpus (hence also empty cpus.internal_effective) if there are
no tasks in there. The transition conditions so far prevented this
corner case.]

+	There are additional constraints on where a partition root can
+	be enabled ("root" or "isolated").  It can only be enabled in
+	a cgroup if all the following conditions are met.
I think the enablement (aka rewriting cpuset.cpus.partition) could be
always possible but it'd result in "root invalid (...)" if the resulting
config doesn't meet the validity condition.

+
+	1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are
+	   not shared by any of its siblings.
The emptiness here is a judgement call (in my formulation of the
conditions it seemed simpler to allow empty cpus.internal_effective with
no tasks).
There are more constraints in enabling a partition. Once it is enabled, there will be less constraints to maintain its validity.

+	2) The parent cgroup is a valid partition root.
Captured in the valid_root definition.

+	3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus".
This is unnecessary strictness. Allow such config,
cpus.internal_effective still can't be more than parent's cpuset.cpus.
(Or do you have a reason to discard such configs?)

+	4) There is no child cgroups with cpuset enabled.  This avoids
+	   cpu migrations of multiple cgroups simultaneously which can
+	   be problematic.
A transition condition (i.e. not relevant to validity conditions).

+	Once becoming a partition root, changes to "cpuset.cpus"
+	is generally allowed as long as the cpu list is exclusive,
+	non-empty and is a superset of children's cpu lists.
Any changes should be allowed otherwise it denies the delegation
principle of v2 (IOW a parent should be able to preempt CPUs given to
chilren previously and not be denied because of them).

(If the change results in failed validity condition the cgroup of course
cannot be be a valid_root anymore.)

+        The constraints of a valid partition root are as follows:
+
+        1) The parent cgroup is a valid partition root.
+        2) "cpuset.cpus.effective" is a subset of "cpuset.cpus"
+        3) "cpuset.cpus.effective" is non-empty when there are tasks
+           in the partition.
(This seem to miss the sibling exclusivity condition.)
Here I'd simply paste the "Validity conditions" specified above instead.
You currently cannot make change to cpuset.cpus that violates the cpu exclusivity rule. The above constraints will not disallow you to make the change. They just affect the validity of the partition root.

+        Changing a partition root to "member" is always allowed.
+        If there are child partition roots underneath it, however,
+        they will be forced to be switched back to "member" too and
+        lose their partitions. So care must be taken to double check
+        for this condition before disabling a partition root.
(Or is this how delegation is intended?) However, AFAICS, parent still
can't remove cpuset.cpus even when the child is a "member". Otherwise,
I agree with the back-switch.
There are only 2 possibilities here. Either we force the child partitions to be become members or invalid partition root. The purpose of invalid partition root is actually a transient state which can be recovered in some way to make the partition again. However, changing a parent partition root to member breaks the possibility of recovering later. That is why I think it is more sensible to force those child partitions to become members.


+	Setting a cgroup to a valid partition root will take the CPUs
+	away from the effective CPUs of the parent partition.
Captured in the definition of cpus.internal_effective.

+	A valid parent partition may distribute out all its CPUs to
+	its child partitions as long as it is not the root cgroup as
+	we need some house-keeping CPUs in the root cgroup.
This actually applies to any root partition that's supposed to host
tasks. (IOW, 'valid_root' cannot be 'degraded'.)

+	An invalid partition is not a real partition even though some
+	internal states may still be kept.
Tautology? (Or new definition of "real".)

+
+	An invalid partition root can be reverted back to a real
+	partition root if none of the constraints of a valid partition
+        root are violated.
Yes. (Also tautological.)

Anyway, as I said above, I just tried to formulate the model for clearer
understanding and the implementation may introduce transition
constraints but it'd be good to always have the simple rules to tell
what's a valid root in the tree and what's not.

Thanks for analyzing each statements for their validity. I will try to improve it to make it easier to understand.

Cheers,
Longman




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux