Hello, On Mon, Jun 05, 2023 at 04:00:39PM -0400, Waiman Long wrote: ... > > file seems hacky to me. e.g. How would it interact with namespacing? Are > > there reasons why this can't be properly hierarchical other than the amount > > of work needed? For example: > > > > cpuset.cpus.exclusive is a per-cgroup file and represents the mask of CPUs > > that the cgroup holds exclusively. The mask is always a subset of > > cpuset.cpus. The parent loses access to a CPU when the CPU is given to a > > child by setting the CPU in the child's cpus.exclusive and the CPU can't > > be given to more than one child. IOW, exclusive CPUs are available only to > > the leaf cgroups that have them set in their .exclusive file. > > > > When a cgroup is turned into a partition, its cpuset.cpus and > > cpuset.cpus.exclusive should be the same. For backward compatibility, if > > the cgroup's parent is already a partition, cpuset will automatically > > attempt to add all cpus in cpuset.cpus into cpuset.cpus.exclusive. > > > > I could well be missing something important but I'd really like to see > > something like the above where the reservation feature blends in with the > > rest of cpuset. > > It can certainly be made hierarchical as you suggest. It does increase > complexity from both user and kernel point of view. > > From the user point of view, there is one more knob to manage hierarchically > which is not used that often. >From user pov, this only affects them when they want to create partitions down the tree, right? > From the kernel point of view, we may need to have one more cpumask per > cpuset as the current subparts_cpus is used to track automatic reservation. > We need another cpumask to contain extra exclusive CPUs not allocated > through automatic reservation. The fact that you mention this new control > file as a list of exclusively owned CPUs for this cgroup. Creating a > partition is in fact allocating exclusive CPUs to a cgroup. So it kind of > overlaps with the cpuset.cpus.partititon file. Can we fail a write to Yes, it substitutes and expands on cpuset.cpus.partition behavior. > cpuset.cpus.exclusive if those exclusive CPUs cannot be granted or will this > exclusive list is only valid if a valid partition can be formed. So we need > to properly manage the dependency between these 2 control files. So, I think cpus.exclusive can become the sole mechanism to arbitrate exclusive owenership of CPUs and .partition can depend on .exclusive. > Alternatively, I have no problem exposing cpuset.cpus.exclusive as a > read-only file. It is a bit problematic if we need to make it writable. I don't follow. How would remote partitions work then? > As for namespacing, you do raise a good point. I was thinking mostly from a > whole system point of view as the use case that I am aware of does not needs > that. To allow delegation of exclusive CPUs to a child cgroup, that cgroup > has to be a partition root itself. One compromise that I can think of is to > only allow automatic reservation only in such a scenario. In that case, I > need to support a remote load balanced partition as well and hierarchical > sub-partitions underneath it. That can be done with some extra code to the > existing v2 patchset without introducing too much complexity. > > IOW, the use of remote partition is only allowed on the whole system level > where one has access to the cgroup root. Exclusive CPUs distribution within > a container can only be done via the use of adjacent partitions with > automatic reservation. Will that be a good enough compromise from your point > of view? It seems too twisted to me. I'd much prefer it to be better integrated with the rest of cpuset. Thanks. -- tejun