Re: [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API

Sargun Dhillon <sargun@xxxxxxxxx> · Sat, 17 Oct 2020 23:59:34 +0000

On Fri, Oct 16, 2020 at 05:45:47AM -0700, Sargun Dhillon wrote:
> This patchset adds some functionality to allow NFS to be used from
> NFS namespaces (containers).
> 
> Changes since v1:
>   * Added samples
> 
> Sargun Dhillon (3):
>   NFS: Use cred from fscontext during fsmount
>   samples/vfs: Split out common code for new syscall APIs
>   samples/vfs: Add example leveraging NFS with new APIs and user
>     namespaces
> 
>  fs/nfs/client.c                        |   2 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
>  fs/nfs/nfs4client.c                    |   2 +-
>  samples/vfs/.gitignore                 |   2 +
>  samples/vfs/Makefile                   |   5 +-
>  samples/vfs/test-fsmount.c             |  86 +-----------
>  samples/vfs/test-nfs-userns.c          | 181 +++++++++++++++++++++++++
>  samples/vfs/vfs-helper.c               |  43 ++++++
>  samples/vfs/vfs-helper.h               |  55 ++++++++
>  9 files changed, 289 insertions(+), 88 deletions(-)
>  create mode 100644 samples/vfs/test-nfs-userns.c
>  create mode 100644 samples/vfs/vfs-helper.c
>  create mode 100644 samples/vfs/vfs-helper.h
> 
> -- 
> 2.25.1
> 

Digging deeper into this a little bit, I actually found that there is some 
problematic aspects of the current behaviour. Because nfs_get_tree_common calls 
sget_fc, and sget_fc sets the super block's s_user_ns (via alloc_super) to the 
fs_context's user namespace unless the global flag is set (which NFS does not 
set), there are a bunch of permissions checks that are done against the super 
block's user_ns.

It looks like this was introduced in:
f2aedb713c28: NFS: Add fs_context support[1]

It turns out that unmapped users in the "parent" user namespace just get an 
EOVERFLOW error when trying to perform a read, even if the UID sent to the NFS 
server to read a file is a valid uid (the uid in the init user ns), and 
inode_permission checks permissions against the mapped UID in the namespace, 
while the authentication credentials (UIDs, GIDs) sent to the server are
those from the init user ns.

[This is all under the assumption there's not upcalls doing ID mapping]

Although, I do not think this presents any security risk (because you have to 
have CAP_SYS_ADMIN in the init user ns to get this far), it definitely seems
like "incorrect" behaviour.

[1]: https://lore.kernel.org/linux-nfs/20191120152750.6880-26-smayhew@xxxxxxxxxx/