Hi Austin, >>>> Does pids limit make sense in the root cgroup? >>> >>> I would say it kind of does, although I would just expect it to track >>> /proc/sys/kernel/pid_max (either as a read-only value, or as an >>> alternative way to set it). >> >> Personally, that seems unintuitive. /proc/sys/kernel/pid_max and the pids >> cgroup controller are orthogonal features, why should they be able to >> affect each other (or even be aware of each other)? > > I wouldn't consider them entirely orthogonal, the sysctl value is the > limiting factor for the maximal value that can be set in a given pids > cgroup. Setting an unlimited value in the cgroup is functionally identical > to setting it to be equal to /proc/sys/kernel/pid_max, and the root cgroup > is functionally equivalent to /proc/sys/kernel/pid_max, because all tasks > that aren't in another cgroup get put in the root. While it is true that /proc/sys/kernel/pid_max would be functionally equivalent to setting pids.max to the value of /proc/sys/kernel/pid_max (and thus the pids root cgroup is functionally equivalent to the parent), it is untrue that the sysctl value is the limiting factor on what "max" is defined as. "max" is defined as the maximum possible pid_t value (it's really the only sane maximum value, because trying to use /proc/sys/kernel/pid_max would be problematic due to the fact that the maximum limit would keep changing and the line between "max" and some arbitrary value would be blurred). In addition, the sysctl value limits the number of pids in the system in a separate part of the kernel -- it has nothing to do with cgroups and cgroups have nothing to do with it. > My only thought is that having the file that would set the limit there might > make things much simpler for software that expects the entire cgroup > structure to be hierarchical. The only valid value for pids.max in the root cgroup would be "max". And "max" is defined as (PID_MAX_LIMIT + 1), not as the current setting of /proc/sys/kernel/pid_max, because the only *real* maximum value of pid_t is PID_MAX_LIMIT so the only reasonable way to represent "max" is a number greater than that. There is an issue with both of the behaviours you describe. The root-level pids.max could either: a) be read-only (which breaks the idea of it being "simpler" because now you have a special case where you can't write to the limit); or (even worse) b) modify some other aspect of the kernel in a way that is unique compared to children of the root hierarchy (which IMO sounds like trouble). In either of those two cases, the idea of it being "simpler" for software that makes the (wrong) assumption that you can limit the global maximum number of pids through the root cgroup is broken because it has either weird side effects (b) or is just an odd feature (a). -- Aleksa Sarai (cyphar) www.cyphar.com -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html