Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid partition root

Waiman Long <llong@xxxxxxxxxx> · Fri, 16 Jul 2021 16:08:15 -0400

On 7/16/21 2:59 PM, Waiman Long wrote:
On 7/16/21 2:44 PM, Waiman Long wrote:
On 7/5/21 1:51 PM, Tejun Heo wrote:
Hello, Waiman.

On Mon, Jun 28, 2021 at 09:06:50AM -0400, Waiman Long wrote:
The main reason for doing this is because normal cpuset control 
file actions
are under the direct control of the cpuset code. So it is up to us 
to decide
whether to grant it or deny it. Hotplug, on the other hand, is not 
under the
control of cpuset code. It can't deny a hotplug operation. This is 
the main
reason why the partition root error state was added in the first 
place.
I have a difficult time convincing myself that this difference 
justifies the
behavior difference and it keeps bothering me that there is a state 
which
can be reached through one path but rejected by the other. I'll 
continue
below.

Normally, users can set cpuset.cpus to whatever value they want 
even though
they are not actually granted. However, turning on partition root 
is under
more strict control. You can't turn on partition root if the CPUs 
requested
cannot actually be granted. The problem with setting the state to just
partition error is that users may not be aware that the partition 
creation
operation fails.  We can't assume all users will do the proper error
checking. I would rather let them know the operation fails rather than
relying on them doing the proper check afterward.

Yes, I agree that it is a different philosophy than the original 
cpuset
code, but I thought one reason of doing cgroup v2 is to simplify the
interface and make it a bit more erorr-proof. Since partition root 
creation
is a relatively rare operation, we can afford to make it more 
strict than
the other operations.
So, IMO, one of the reasons why cgroup1 interface was such a mess was
because each piece of interaction was designed ad-hoc without regard 
to the
overall consistency. One person feels a particular way of 
interacting with
the interface is "correct" and does it that way and another person does
another part in a different way. In the end, we ended up with a messy
patchwork.

One problematic aspect of cpuset in cgroup1 was the handling of failure
modes, which was caused by the same exact approach - we wanted the 
interface
to reject invalid configurations outright even though we didn't have 
the
ability to prevent those configurations from occurring through other 
paths,
which makes the failure mode more subtle by further obscuring them.

I think a better approach would be having a clear signal and 
mechanism to
watch the state and explicitly requiring users to verify and monitor 
the
state transitions.

Sorry for the late reply as I was busy with other works.

I agree with you on principle. However, the reason why there are more 
restrictions on enabling partition is because I want to avoid forcing 
the users to always read back cpuset.partition.type to see if the 
operation succeeds instead of just getting an error from the 
operation. The former approach is more error prone. If you don't want 
changes in existing behavior, I can relax the checking and allow them 
to become an invalid partition if an illegal operation happens.

Also there is now another cpuset patch to extend cpu isolation to 
cgroup v1 [1]. I think it is better suit to the cgroup v2 partition 
scheme, but cgroup v1 is still quite heavily out there.

Please let me know what you want me to do and I will send out a v3 
version. 

Note that the current cpuset partition implementation have implemented 
some restrictions on when a partition can be enabled. However, I 
missed some corner cases in the original implementation that allow 
certain cpuset operations to make a partition invalid. I tried to plug 
those holes in this patchset. However, if maintaining backward 
compatibility is more important, I can leave those holes and update 
the documentation to make sure that people check cpuset.partition.type 
to confirm if their operation succeeds. 

I just realize that partition root set the CPU_EXCLUSIVE bit. So changes 
to cpuset.cpus that break exclusivity rule is not allowed anyway. This 
patchset is just adding additional checks so that cpuset.cpus changes 
that break the partition root rules will not be allowed. I can remove 
those additional checks for this patchset and allow cpuset.cpus changes 
that break the partition root rules to make it invalid instead. However, 
I still want invalid changes to cpuset.partition.type to be disallowed.

Cheers,
Longman