The reason I think this is necessary is that the kernel has no idea how to direct upcalls to what userspace considers to be a container - current Linux practice appears to make a "container" just an arbitrarily chosen junction of namespaces, control groups and files, which may be changed individually within the "container".
Just want to point out that if the kernel APIs for containers massively change, then the OCI will have to completely rework how we describe containers (and so will all existing runtimes).
Not to mention that while I don't like how hard it is (from a runtime perspective) to actually set up a container securely, there are undoubtedly benefits to having namespaces split out. The network namespace being separate means that in certain contexts you actually don't want to create a new network namespace when creating a container.
I had some ideas about how you could implement bridging in userspace (as an unprivileged user, for rootless containers) but if you can't join namespaces individually then such a setup is not practically possible.
-- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html