Re: [PATCHv2 7/7] cgroup: mount cgroupns-root when inside non-init cgroupns

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Tue, 4 Nov 2014 07:00:15 -0800

On Tue, Nov 4, 2014 at 5:46 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Aditya.
>
> On Mon, Nov 03, 2014 at 02:43:47PM -0800, Aditya Kali wrote:
>> I agree that this is effectively bind-mounting, but doing this in kernel
>> makes it really convenient for the userspace. The process that sets up the
>> container doesn't need to care whether it should bind-mount cgroupfs inside
>> the container or not. The tasks inside the container can mount cgroupfs on
>> as-needed basis. The root container manager can simply unshare cgroupns and
>> forget about the internal setup. I think this is useful just for the reason
>> that it makes life much simpler for userspace.
>
> If it's okay to require userland to just do bind mounting, I'd be far
> happier with that.  cgroup mount code is already overcomplicated
> because of the dynamic matching of supers to mounts when it could just
> have told userland to use bind mounting.  Doesn't the host side have
> to set up some of the filesystem layouts anyway?  Does it really
> matter that we require the host to set up cgroup hierarchy too?
>

Sort of, but only sort of.

You can create a container by unsharing namespaces, mounting
everything, and then calling pivot_root.  But this is unpleasant
because of the strange way that pid namespaces work -- you generally
have to fork first, so this gets tedious.  And it doesn't integrate
well with things like fstab or other container-side configuration
mechanisms.

It's nicer if you can unshare namespaces, mount the bare minimum,
pivot_root, and let the contained software do as much setup as
possible.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html