On Tue, Aug 16, 2022 at 01:35:13PM +0200, Christian Brauner wrote: > Ensure that POSIX ACLs checking, getting, and setting works correctly > for filesystems mountable with a filesystem idmapping ("fs_idmapping") > that want to support idmapped mounts ("mnt_idmapping"). > > Note that no filesystems mountable with an fs_idmapping do yet support > idmapped mounts. This is required infrastructure work to unblock this. > > As we explained in detail in [1] the fs_idmapping is irrelevant for > getxattr() and setxattr() when mapping the ACL_{GROUP,USER} {g,u}ids > stored in the uapi struct posix_acl_xattr_entry in > posix_acl_fix_xattr_{from,to}_user(). > > But for acl_permission_check() and posix_acl_{g,s}etxattr_idmapped_mnt() > the fs_idmapping matters. > > acl_permission_check(): > During lookup POSIX ACLs are retrieved directly via i_op->get_acl() and > are returned via the kernel internal struct posix_acl which contains > e_{g,u}id members of type k{g,u}id_t that already take the > fs_idmapping into acccount. > > For example, a POSIX ACL stored with u4 on the backing store is mapped > to k10000004 in the fs_idmapping. The mnt_idmapping remaps the POSIX ACL > to k20000004. In order to do that the fs_idmapping needs to be taken > into account but that doesn't happen yet (Again, this is a > counterfactual currently as fuse doesn't support idmapped mounts > currently. It's just used as a convenient example.): > > fs_idmapping: u0:k10000000:r65536 > mnt_idmapping: u0:v20000000:r65536 > ACL_USER: k10000004 > > acl_permission_check() > -> check_acl() > -> get_acl() > -> i_op->get_acl() == fuse_get_acl() > -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) > { > k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, > u4 /* ACL_USER */); > } > -> posix_acl_permission() > { > -1 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, > &init_user_ns, > k10000004); > vfsuid_eq_kuid(-1, k10000004 /* caller_fsuid */) > } > > In order to correctly map from the fs_idmapping into mnt_idmapping we > require the relevant fs_idmaping to be passed: > > acl_permission_check() > -> check_acl() > -> get_acl() > -> i_op->get_acl() == fuse_get_acl() > -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) > { > k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, > u4 /* ACL_USER */); > } > -> posix_acl_permission() > { > v20000004 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, > u0:k10000000:r65536 /* fs_idmapping */, > k10000004); > vfsuid_eq_kuid(v20000004, k10000004 /* caller_fsuid */) > } > > The initial_idmapping is only correct for the current situation because > all filesystems that currently support idmapped mounts do not support > being mounted with an fs_idmapping. > > Note that ovl_get_acl() is used to retrieve the POSIX ACLs from the > relevant lower layer and the lower layer's mnt_idmapping needs to be > taken into account and so does the fs_idmapping. See 0c5fd887d2bb ("acl: > move idmapped mount fixup into vfs_{g,s}etxattr()") for more details. > > For posix_acl_{g,s}etxattr_idmapped_mnt() it is not as obvious why the > fs_idmapping matters as it is for acl_permission_check(). Especially > because it doesn't matter for posix_acl_fix_xattr_{from,to}_user() (See > [1] for more context.). > > Because posix_acl_{g,s}etxattr_idmapped_mnt() operate on the uapi > struct posix_acl_xattr_entry which contains {g,u}id_t values and thus > give the impression that the fs_idmapping is irrelevant as at this point > appropriate {g,u}id_t values have seemlingly been generated. > > As we've stated multiple times this assumption is wrong and in fact the > uapi struct posix_acl_xattr_entry is taking idmappings into account > depending at what place it is operated on. > > posix_acl_getxattr_idmapped_mnt() > When posix_acl_getxattr_idmapped_mnt() is called the values stored in > the uapi struct posix_acl_xattr_entry are mapped according to the > fs_idmapping. This happened when they were read from the backing store > and then translated from struct posix_acl into the uapi > struct posix_acl_xattr_entry during posix_acl_to_xattr(). > > In other words, the fs_idmapping matters as the values stored as > {g,u}id_t in the uapi struct posix_acl_xattr_entry have been generated > by it. > > So we need to take the fs_idmapping into account during make_vfsuid() > in posix_acl_getxattr_idmapped_mnt(). > > posix_acl_setxattr_idmapped_mnt() > When posix_acl_setxattr_idmapped_mnt() is called the values stored as > {g,u}id_t in uapi struct posix_acl_xattr_entry are intended to be the > values that ultimately get turned back into a k{g,u}id_t in > posix_acl_from_xattr() (which turns the uapi > struct posix_acl_xattr_entry into the kernel internal struct posix_acl). > > In other words, the fs_idmapping matters as the values stored as > {g,u}id_t in the uapi struct posix_acl_xattr_entry are intended to be > the values that will be undone in the fs_idmapping when writing to the > backing store. > > So we need to take the fs_idmapping into account during from_vfsuid() > in posix_acl_setxattr_idmapped_mnt(). > > Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@xxxxxxxxxx [1] > Fixes: 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") > Cc: Seth Forshee <sforshee@xxxxxxxxxxxxxxxx> > Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx> Looks good to me. Reviewed-by: Seth Forshee <sforshee@xxxxxxxxxxxxxxxx> > --- > fs/overlayfs/inode.c | 11 +++++++---- > fs/posix_acl.c | 15 +++++++++------ > 2 files changed, 16 insertions(+), 10 deletions(-) > > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c > index b45fea69fff3..0fbcb590af84 100644 > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -460,9 +460,12 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size) > * of the POSIX ACLs retrieved from the lower layer to this function to not > * alter the POSIX ACLs for the underlying filesystem. > */ > -static void ovl_idmap_posix_acl(struct user_namespace *mnt_userns, > +static void ovl_idmap_posix_acl(struct inode *realinode, > + struct user_namespace *mnt_userns, > struct posix_acl *acl) > { > + struct user_namespace *fs_userns = i_user_ns(realinode); > + > for (unsigned int i = 0; i < acl->a_count; i++) { > vfsuid_t vfsuid; > vfsgid_t vfsgid; > @@ -470,11 +473,11 @@ static void ovl_idmap_posix_acl(struct user_namespace *mnt_userns, > struct posix_acl_entry *e = &acl->a_entries[i]; > switch (e->e_tag) { > case ACL_USER: > - vfsuid = make_vfsuid(mnt_userns, &init_user_ns, e->e_uid); > + vfsuid = make_vfsuid(mnt_userns, fs_userns, e->e_uid); > e->e_uid = vfsuid_into_kuid(vfsuid); > break; > case ACL_GROUP: > - vfsgid = make_vfsgid(mnt_userns, &init_user_ns, e->e_gid); > + vfsgid = make_vfsgid(mnt_userns, fs_userns, e->e_gid); > e->e_gid = vfsgid_into_kgid(vfsgid); > break; > } > @@ -536,7 +539,7 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) > if (!clone) > clone = ERR_PTR(-ENOMEM); > else > - ovl_idmap_posix_acl(mnt_user_ns(realpath.mnt), clone); > + ovl_idmap_posix_acl(realinode, mnt_user_ns(realpath.mnt), clone); > /* > * Since we're not in RCU path walk we always need to release the > * original ACLs. > diff --git a/fs/posix_acl.c b/fs/posix_acl.c > index 1d17d7b13dcd..5af33800743e 100644 > --- a/fs/posix_acl.c > +++ b/fs/posix_acl.c > @@ -361,6 +361,7 @@ posix_acl_permission(struct user_namespace *mnt_userns, struct inode *inode, > const struct posix_acl *acl, int want) > { > const struct posix_acl_entry *pa, *pe, *mask_obj; > + struct user_namespace *fs_userns = i_user_ns(inode); > int found = 0; > vfsuid_t vfsuid; > vfsgid_t vfsgid; > @@ -376,7 +377,7 @@ posix_acl_permission(struct user_namespace *mnt_userns, struct inode *inode, > goto check_perm; > break; > case ACL_USER: > - vfsuid = make_vfsuid(mnt_userns, &init_user_ns, > + vfsuid = make_vfsuid(mnt_userns, fs_userns, > pa->e_uid); > if (vfsuid_eq_kuid(vfsuid, current_fsuid())) > goto mask; > @@ -390,7 +391,7 @@ posix_acl_permission(struct user_namespace *mnt_userns, struct inode *inode, > } > break; > case ACL_GROUP: > - vfsgid = make_vfsgid(mnt_userns, &init_user_ns, > + vfsgid = make_vfsgid(mnt_userns, fs_userns, > pa->e_gid); > if (vfsgid_in_group_p(vfsgid)) { > found = 1; > @@ -736,6 +737,7 @@ void posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, > { > struct posix_acl_xattr_header *header = value; > struct posix_acl_xattr_entry *entry = (void *)(header + 1), *end; > + struct user_namespace *fs_userns = i_user_ns(inode); > int count; > vfsuid_t vfsuid; > vfsgid_t vfsgid; > @@ -753,13 +755,13 @@ void posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, > switch (le16_to_cpu(entry->e_tag)) { > case ACL_USER: > uid = make_kuid(&init_user_ns, le32_to_cpu(entry->e_id)); > - vfsuid = make_vfsuid(mnt_userns, &init_user_ns, uid); > + vfsuid = make_vfsuid(mnt_userns, fs_userns, uid); > entry->e_id = cpu_to_le32(from_kuid(&init_user_ns, > vfsuid_into_kuid(vfsuid))); > break; > case ACL_GROUP: > gid = make_kgid(&init_user_ns, le32_to_cpu(entry->e_id)); > - vfsgid = make_vfsgid(mnt_userns, &init_user_ns, gid); > + vfsgid = make_vfsgid(mnt_userns, fs_userns, gid); > entry->e_id = cpu_to_le32(from_kgid(&init_user_ns, > vfsgid_into_kgid(vfsgid))); > break; > @@ -775,6 +777,7 @@ void posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, > { > struct posix_acl_xattr_header *header = value; > struct posix_acl_xattr_entry *entry = (void *)(header + 1), *end; > + struct user_namespace *fs_userns = i_user_ns(inode); > int count; > vfsuid_t vfsuid; > vfsgid_t vfsgid; > @@ -793,13 +796,13 @@ void posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, > case ACL_USER: > uid = make_kuid(&init_user_ns, le32_to_cpu(entry->e_id)); > vfsuid = VFSUIDT_INIT(uid); > - uid = from_vfsuid(mnt_userns, &init_user_ns, vfsuid); > + uid = from_vfsuid(mnt_userns, fs_userns, vfsuid); > entry->e_id = cpu_to_le32(from_kuid(&init_user_ns, uid)); > break; > case ACL_GROUP: > gid = make_kgid(&init_user_ns, le32_to_cpu(entry->e_id)); > vfsgid = VFSGIDT_INIT(gid); > - gid = from_vfsgid(mnt_userns, &init_user_ns, vfsgid); > + gid = from_vfsgid(mnt_userns, fs_userns, vfsgid); > entry->e_id = cpu_to_le32(from_kgid(&init_user_ns, gid)); > break; > default: > > base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868 > -- > 2.34.1 >