Re: [RFC][PATCH 0/9] Make containers kernel objects

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 23 May 2017 09:56:19 -0500

David Howells <dhowells@xxxxxxxxxx> writes:

> Aleksa Sarai <asarai@xxxxxxx> wrote:
>
>> >> The reason I think this is necessary is that the kernel has no idea
>> >> how to direct upcalls to what userspace considers to be a container -
>> >> current Linux practice appears to make a "container" just an
>> >> arbitrarily chosen junction of namespaces, control groups and files,
>> >> which may be changed individually within the "container".
>> 
>> Just want to point out that if the kernel APIs for containers massively
>> change, then the OCI will have to completely rework how we describe containers
>> (and so will all existing runtimes).
>> 
>> Not to mention that while I don't like how hard it is (from a runtime
>> perspective) to actually set up a container securely, there are undoubtedly
>> benefits to having namespaces split out. The network namespace being separate
>> means that in certain contexts you actually don't want to create a new network
>> namespace when creating a container.
>
> Yep, I quite agree.
>
> However, certain things need to be made per-net namespace that *aren't*.  DNS
> results, for instance.
>
> As an example, I could set up a client machine with two ethernet ports, set up
> two DNS+NFS servers, each of which think they're called "foo.bar" and attach
> each server to a different port on the client machine.  Then I could create a
> pair of containers on the client machine and route the network in each
> container to a different port.  Now there's a problem because the names of the
> cached DNS records for each port overlap.

Please look at ip netns add.  It does solve this in userspace rather
simply.

> Further, the NFS idmapper needs to be able to direct its calls to the
> appropriate network.

Eric