Hello, Waiman.
On Mon, Jun 28, 2021 at 09:06:50AM -0400, Waiman Long wrote:
The main reason for doing this is because normal cpuset control file
actions
are under the direct control of the cpuset code. So it is up to us
to decide
whether to grant it or deny it. Hotplug, on the other hand, is not
under the
control of cpuset code. It can't deny a hotplug operation. This is
the main
reason why the partition root error state was added in the first place.
I have a difficult time convincing myself that this difference
justifies the
behavior difference and it keeps bothering me that there is a state
which
can be reached through one path but rejected by the other. I'll continue
below.
Normally, users can set cpuset.cpus to whatever value they want even
though
they are not actually granted. However, turning on partition root is
under
more strict control. You can't turn on partition root if the CPUs
requested
cannot actually be granted. The problem with setting the state to just
partition error is that users may not be aware that the partition
creation
operation fails. We can't assume all users will do the proper error
checking. I would rather let them know the operation fails rather than
relying on them doing the proper check afterward.
Yes, I agree that it is a different philosophy than the original cpuset
code, but I thought one reason of doing cgroup v2 is to simplify the
interface and make it a bit more erorr-proof. Since partition root
creation
is a relatively rare operation, we can afford to make it more strict
than
the other operations.
So, IMO, one of the reasons why cgroup1 interface was such a mess was
because each piece of interaction was designed ad-hoc without regard
to the
overall consistency. One person feels a particular way of interacting
with
the interface is "correct" and does it that way and another person does
another part in a different way. In the end, we ended up with a messy
patchwork.
One problematic aspect of cpuset in cgroup1 was the handling of failure
modes, which was caused by the same exact approach - we wanted the
interface
to reject invalid configurations outright even though we didn't have the
ability to prevent those configurations from occurring through other
paths,
which makes the failure mode more subtle by further obscuring them.
I think a better approach would be having a clear signal and
mechanism to
watch the state and explicitly requiring users to verify and monitor the
state transitions.