On Tue, May 12, 2009 at 05:01:58PM -0700, Eric W. Biederman wrote: > Matt Helsley <matthltc@xxxxxxxxxx> writes: > > > Sun RPC currently opens sockets from the initial network namespace making it > > impossible to restrict which NFS servers a container may interact with. > > > > For example, the NFS server at 10.0.0.3 reachable from the initial namespace > > will always be used even if an entirely different server with the address > > 10.0.0.3 is reachable from a container's network namespace. Hence network > > namespaces cannot be used to restrict the network access of a container as long > > as the RPC code opens sockets using the initial network namespace. This is > > in stark contrast to other protocols like HTTP where the sockets are created in > > their proper namespaces because kernel threads are not used to open sockets for > > client network IO. > > > > We may plausibly end up with namespaces created by: > > I) The administrator may mount 10.0.0.3:/export_foo from init's > > container, clone the mount namespace, and unmount from the original > > mount namespace. > > > > II) The administrator may start a task which clones the mount namespace > > before mounting 10.0.0.3:/export_foo. > > > > Proposed Solution: > > > > The network namespace of the task that did the mount best defines which server > > the "administrator", whether in a container or not, expects to work with. > > When the mount is done inside a container then that is the network namespace > > to use. When the mount is done prior to creating the container then that's the > > namespace that should be used. > > > > This allows system administrators to isolate network traffic generated by NFS > > clients by mounting after creating a container. If partial isolation is desired > > then the administrator may mount before creating a container with a new network > > namespace. In each case the RPC packets would originate from a consistent > > namespace. > > > > One way to ensure consistent namespace usage would be to hold a reference to > > the original network namespace as long as the mount exists. This naturally > > suggests storing the network namespace reference in the NFS superblock. > > However, it may be better to store it with the RPC transport itself since > > it is directly responsible for (re)opening the sockets. > > > > This patch adds a reference to the network namespace to the RPC > > transport. When the NFS export is mounted the network namespace of > > the current task establishes which namespace to reference. That > > reference is stored in the RPC transport and used to open sockets > > whenever a new socket is required. > > Matt. This may be the basis of something and the problem is real. > However it is clear you have missed a lot of details. Well crap. While I did not ignore all the RPC services I noticed when I tried reading the NFS/RPC code, based on the response from Chuck, you, and Trond, I clearly fucked up when I thought I had properly understood how the RPC code works with the services that support NFS. I figured that since RPC was the core of these services it would be a good place to start trying to address the problem. It looked like the RPC transport was a good place to deal with all of these services since it's responsible for (re)opening the sockets needed to perform RPC IO. But apparently the transport is not shared the way I thought it was :/.. > So could you first address this problem in nfs_get_sb by > denying the mount if we are not in the initial network namespace. > > I.e. > > if (current->nsproxy->net_ns != &init_net) > return -EINVAL; > > That should be a lot simpler to get right and at least give reliable > and predictable semantics. Yes, that seems like a reasonable preventitive measure for now. -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html