Hello, Peter. On Thu, Jun 01, 2017 at 05:10:45PM +0200, Peter Zijlstra wrote: > I've not had time to look at any of this. But the question I'm most > curious about is how cgroup-v2 preserves the container invariant. > > That is, each container (namespace) should look like a 'real' machine. > So just like userns allows to have a uid-0 (aka root) for each container > and pidns allows a pid-1 for each container, cgroupns should provide a > root group for each container. > > And cgroup-v2 has this 'exception' (aka wart) for the root group which > needs to be replicated for each namespace. The goal has never been that a container must be indistinguishible from a real machine. For certain things, things simply don't have exact equivalents due to sharing (memory stats or journal writes for example) and those things are exactly why people prefer containers over VMs for certain use cases. If one wants full replication, VM would be the way to go. The goal is allowing enough container invariant so that appropriate workloads can be contained and co-exist in useful ways. This also means that the contained workload is usually either a bit illiterate w.r.t. to the system details (doesn't care) or makes some adjustments for running inside a container (most quasi-full-system ones already do). System root is inherently different from all other nested roots. Making some exceptions for the root isn't about taking away from other roots but more reflecting the inherent differences - there are things which are inherently system / bare-metal. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html