Matt Helsley <matthltc@xxxxxxxxxx> writes: > Sun RPC currently opens sockets from the initial network namespace making it > impossible to restrict which NFS servers a container may interact with. > > For example, the NFS server at 10.0.0.3 reachable from the initial namespace > will always be used even if an entirely different server with the address > 10.0.0.3 is reachable from a container's network namespace. Hence network > namespaces cannot be used to restrict the network access of a container as long > as the RPC code opens sockets using the initial network namespace. This is > in stark contrast to other protocols like HTTP where the sockets are created in > their proper namespaces because kernel threads are not used to open sockets for > client network IO. > > We may plausibly end up with namespaces created by: > I) The administrator may mount 10.0.0.3:/export_foo from init's > container, clone the mount namespace, and unmount from the original > mount namespace. > > II) The administrator may start a task which clones the mount namespace > before mounting 10.0.0.3:/export_foo. > > Proposed Solution: > > The network namespace of the task that did the mount best defines which server > the "administrator", whether in a container or not, expects to work with. > When the mount is done inside a container then that is the network namespace > to use. When the mount is done prior to creating the container then that's the > namespace that should be used. > > This allows system administrators to isolate network traffic generated by NFS > clients by mounting after creating a container. If partial isolation is desired > then the administrator may mount before creating a container with a new network > namespace. In each case the RPC packets would originate from a consistent > namespace. > > One way to ensure consistent namespace usage would be to hold a reference to > the original network namespace as long as the mount exists. This naturally > suggests storing the network namespace reference in the NFS superblock. > However, it may be better to store it with the RPC transport itself since > it is directly responsible for (re)opening the sockets. > > This patch adds a reference to the network namespace to the RPC > transport. When the NFS export is mounted the network namespace of > the current task establishes which namespace to reference. That > reference is stored in the RPC transport and used to open sockets > whenever a new socket is required. Matt. This may be the basis of something and the problem is real. However it is clear you have missed a lot of details. So could you first address this problem in nfs_get_sb by denying the mount if we are not in the initial network namespace. I.e. if (current->nsproxy->net_ns != &init_net) return -EINVAL; That should be a lot simpler to get right and at least give reliable and predictable semantics. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html