In several patches, support was introduced to NFS for user namespaces: ccfe51a5161c: SUNRPC: Fix the server AUTH_UNIX userspace mappings e6667c73a27d: SUNRPC: rsi_parse() should use the current user namespace 1a58e8a0e5c1: NFS: Store the credential of the mount process in the nfs_server 283ebe3ec415: SUNRPC: Use the client user namespace when encoding creds ac83228a7101: SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall 264d948ce7d0: NFS: Convert NFSv3 to use the container user namespace 58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace c207db2f5da5: NFS: Convert NFSv2 to use the container user namespace 3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces All of these commits are predicated on the NFS server being created with credentials that are in the user namespace of interest. The new VFS mount APIs help in this[1], in that the creation of the FSFD (fsopen) captures a set of credentials at creation time. Normally, the new file system API users automatically get their super block's user_ns set to the fc->user_ns in sget_fc, but since NFS has to do special manipulation of UIDs / GIDs on the wire, it keeps track of credentials itself. Unfortunately, the credentials that the NFS uses are the current_creds at the time FSCONFIG_CMD_CREATE is called. When FSCONFIG_CMD_CREATE is called, simultaneously, mount_capable is checked -- which checks if the user has CAP_SYS_ADMIN in the init_user_ns because NFS does not have FS_USERNS_MOUNT. This makes a subtle change so that the struct cred from fsopen is used instead. Since the fs_context is available at server creation time, and it has the credentials, we can just use those. This roughly allows a privileged user to mount on behalf of an unprivileged usernamespace, by forking off and calling fsopen in the unprivileged user namespace. It can then pass back that fsfd to the privileged process which can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE before switching back into the mount namespace of the container, and finish up the mounting process and call fsmount and move_mount. This change makes a small user space change if the user performs this elaborate process of passing around file descriptors, and switching namespaces. There may be a better way to go about this, or even enable FS_USERNS_MOUNT on NFS, but this seems like the safest and most straightforward approach. [1]: https://lore.kernel.org/linux-fsdevel/155059610368.17079.2220554006494174417.stgit@xxxxxxxxxxxxxxxxxxxxxx/ Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> Cc: J. Bruce Fields <bfields@xxxxxxxxxxxx> Cc: Chuck Lever <chuck.lever@xxxxxxxxxx> Cc: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Cc: Anna Schumaker <anna.schumaker@xxxxxxxxxx> Cc: David Howells <dhowells@xxxxxxxxxx> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Kyle Anderson <kylea@xxxxxxxxxxx> --- fs/nfs/client.c | 2 +- fs/nfs/nfs4client.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index f1ff3076e4a4..fdefcc649884 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -967,7 +967,7 @@ struct nfs_server *nfs_create_server(struct fs_context *fc) if (!server) return ERR_PTR(-ENOMEM); - server->cred = get_cred(current_cred()); + server->cred = get_cred(fc->cred); error = -ENOMEM; fattr = nfs_alloc_fattr(); diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 0bd77cc1f639..92ff6fb8e324 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -1120,7 +1120,7 @@ struct nfs_server *nfs4_create_server(struct fs_context *fc) if (!server) return ERR_PTR(-ENOMEM); - server->cred = get_cred(current_cred()); + server->cred = get_cred(fc->cred); auth_probe = ctx->auth_info.flavor_len < 1; -- 2.25.1