If both the mount namespace and the mount point support UID/GID shifts, then before doing any permission check, translate inode->{i_uid|i_gid} into the kernel virtual view, then use the result to do the permission checks. If there is no support for UID/GID shifts, we fallback to inode->{i_uid|i_gid} on-disk values. The VFS will shift these values to the virtual view, the result will be used to compare with current's fsuid and fsgid and to perform classic or capable checks. Since inode->{i_uid|i_gid} will always contain the on-disk values we do the virtual translation when an access is needed. This solves the problem of privileged userns or users inside containers that want to access files, but the access fails since VFS uses their global kuid/kgid. Permission checks inside user_ns_X ---------------------------------- Without this Patch: ------------------------------------------------------------------------- inode->uid on Disk | init_user_ns uid | user_ns_X uid | Access ------------------------------------------------------------------------- 0 | 1000000 | 0 (userns root) | Denied ------------------------------------------------------------------------- 999 | 1000999 | 999 | Denied ------------------------------------------------------------------------- 1000 | 1001000 | 1000 | Denied ------------------------------------------------------------------------- 1000 | 1000000 | 0 (userns root CAPS) | Denied ------------------------------------------------------------------------- 0 | 1001000 | 1000 | Denied ------------------------------------------------------------------------- With this patch: -------------------------------------------------------------------------- inode->uid on Disk | init_user_ns uid | user_ns_X uid | Access -------------------------------------------------------------------------- 0 | 1000000 | 0 (userns root) | Granted -------------------------------------------------------------------------- 999 | 1000999 | 999 | Granted -------------------------------------------------------------------------- 1000 | 1001000 | 1000 | Granted -------------------------------------------------------------------------- 1000 | 1000000 | 0 (userns root CAPS) | Granted -------------------------------------------------------------------------- 999 | 1000000 | 0 (userns root CAPS) | Granted -------------------------------------------------------------------------- 0 | 1001000 | 1000 | Denied -------------------------------------------------------------------------- 0 | 1000999 | 999 | Denied -------------------------------------------------------------------------- 1000 | 1000999 | 999 | Denied -------------------------------------------------------------------------- * CAPS: means capabilities, the access was granted due to the capabilities of the caller inside user_ns_X and the shifted UID/GID of the inode are also mapped in that user_ns_X Privileged root user namespaces with uid 0 inside the container will be able to access inodes->i_uid == 0 on-disk if that inode is on a file system that supports VFS UID/GID shifts and the caller is inside a mount namespace that also supports the above. Signed-off-by: Dongsu Park <dongsu@xxxxxxxxxxxx> Signed-off-by: Djalal Harouni <tixxdz@xxxxxxxxxx> --- fs/inode.c | 5 +++-- fs/namei.c | 6 ++++-- kernel/capability.c | 14 ++++++++++++-- 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 69b8b52..07daf5f 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1961,12 +1961,13 @@ EXPORT_SYMBOL(inode_init_owner); bool inode_owner_or_capable(const struct inode *inode) { struct user_namespace *ns; + kuid_t i_uid = vfs_shift_i_uid_to_virtual(inode); - if (uid_eq(current_fsuid(), inode->i_uid)) + if (uid_eq(current_fsuid(), i_uid)) return true; ns = current_user_ns(); - if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid)) + if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, i_uid)) return true; return false; } diff --git a/fs/namei.c b/fs/namei.c index 1d9ca2d..f7ee498 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -289,8 +289,10 @@ static int check_acl(struct inode *inode, int mask) static int acl_permission_check(struct inode *inode, int mask) { unsigned int mode = inode->i_mode; + kuid_t i_uid = vfs_shift_i_uid_to_virtual(inode); + kgid_t i_gid = vfs_shift_i_gid_to_virtual(inode); - if (likely(uid_eq(current_fsuid(), inode->i_uid))) + if (likely(uid_eq(current_fsuid(), i_uid))) mode >>= 6; else { if (IS_POSIXACL(inode) && (mode & S_IRWXG)) { @@ -299,7 +301,7 @@ static int acl_permission_check(struct inode *inode, int mask) return error; } - if (in_group_p(inode->i_gid)) + if (in_group_p(i_gid)) mode >>= 3; } diff --git a/kernel/capability.c b/kernel/capability.c index 45432b5..fdc8afb 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -441,9 +441,19 @@ EXPORT_SYMBOL(file_ns_capable); */ bool capable_wrt_inode_uidgid(const struct inode *inode, int cap) { + kuid_t i_uid; + kgid_t i_gid; struct user_namespace *ns = current_user_ns(); - return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) && - kgid_has_mapping(ns, inode->i_gid); + /* + * Check if inode's UID/GID are mean to be shifted into the current + * mount namespace, if so we use the result to check if the shifted + * UID/GID have a mapping in current's user namespace. + */ + i_uid = vfs_shift_i_uid_to_virtual(inode); + i_gid = vfs_shift_i_gid_to_virtual(inode); + + return ns_capable(ns, cap) && kuid_has_mapping(ns, i_uid) && + kgid_has_mapping(ns, i_gid); } EXPORT_SYMBOL(capable_wrt_inode_uidgid); -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html