On Thu, Jul 19, 2018 at 5:37 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > On Thu, 2018-07-19 at 17:00 -0700, Sargun Dhillon wrote: >> On Thu, Jul 19, 2018 at 12:45 PM, Trond Myklebust >> <trondmy@xxxxxxxxxxxxxxx> wrote: >> > >> > On Thu, 2018-07-19 at 17:42 +0000, Sargun Dhillon wrote: >> > > This adds the ability to pass a non-init user namespace to >> > > rpcauth_create, >> > > via rpc_auth_create_args. If the specific authentication >> > > mechanism >> > > does not support non-init user namespaces, then it will return an >> > > error. >> > > >> > > Currently, the only two authentication mechanisms that support >> > > non-init user namespaces are auth_null, and auth_unix. auth_unix >> > > will send the UID / GID from the user namespace for >> > > authentication. >> > > >> > >> > Firstly, please at least Cc the linux-nfs mailing list (as per the >> > MAINTAINERS file) when changing NFS and sunrpc code. >> >> Sorry about that. >> >> > >> > Secondly, can you please explain why we would want to use any user >> > namespace other than the one specified in the net namespace >> > structure >> > (struct net) when communicating with network resources such as >> > rpc.gssd, the idmapper or, for that matter, the NFS server? >> >> We mount NFS volumes for containers (user namespaces) today. On >> multiple machines, they may have different mappings of uids in the >> user namespace to kuids. If this is the case, it breaks auth_unix >> because it uses the kuid in the init user ns mapping for the uid it >> sends to the server. >> > > The point is that the user namespace conversions that happen in the > sunrpc layer are all for dealing with services. The AUTH_GSS upcalls > should _only_ be speaking to an rpc.gssd daemon that runs in whatever > container that owns the net namespace (and that created the rpc_pipefs > objects). > > Ditto for the idmapper although if you use the keyring based (i.e. the > non legacy) idmapper, that runs in the init namespace. > >> I think that if we moved to using the net->user_ns for auth_unix, >> that'd be great, but it'd break userspace, as far as I know. We have >> a >> slightly hacked version of this patch that uses the s_user_ns from >> the >> nfs superblock, and I think that uids from the backing store (whether >> it be a block device, or a server), should be written as the kuid, >> and >> translated when it goes in and out of the userns. > > The actual applications running in the containers are interacting > through the standard system calls. They do not need any extra > conversion, because the syscalls convert them to kuids and back. > > IOW: We can completely ignore the user namespace of the container, > since that is taken care of at the syscall level. > > The only namespaces we care about are: > > 1) The container that set up the mount in the first place, since > presumably is is authorised to use its own uid/gids when talking to the > mountpoint. That user namespace had better be the same one as the one > saved in 'struct net' that was saved when we set up the mountpoint. > > 2) The containers that are running rpc.gssd and rpc.idmapd. Again, > those are tied to struct net. > When the server presents with NFS_CAP_UIDGID_NOMAP, and you use auth_unix there are no upcalls to rpc.gssd, nor rpc.idmapd. The mapping to uid in the init user ns are sent to the NFS server, even if net->user_ns is not init_user_ns. The syscall happens with a user in a user namespace with, say, ID 0, and their cred has the from_kuid(&init_user_ns...) of 100, the uid the server receives is still 100. If we choose to convert them based on the network namespace, it would solve the problem just fine, but that'd be a userspace breaking change. I think we have to use the s_user_ns. >> Do you have any other suggestions, if we eventually want to enable >> NFS4 for user namespaces? > > See above. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx >