Jan Kara <jack@xxxxxxx> writes: > On Tue 09-10-12 14:46:28, Eric W. Biederman wrote: >> Jan Kara <jack@xxxxxxx> writes: >> >> diff --git a/fs/xattr.c b/fs/xattr.c >> >> index 4d45b71..c111745 100644 >> >> --- a/fs/xattr.c >> >> +++ b/fs/xattr.c >> >> @@ -20,6 +20,7 @@ >> >> #include <linux/fsnotify.h> >> >> #include <linux/audit.h> >> >> #include <linux/vmalloc.h> >> >> +#include <linux/posix_acl_xattr.h> >> >> >> >> #include <asm/uaccess.h> >> >> >> >> @@ -347,6 +348,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value, >> >> error = -EFAULT; >> >> goto out; >> >> } >> >> + if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || >> >> + (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) >> >> + posix_acl_fix_xattr_from_user(kvalue, size); >> >> } >> >> >> >> error = vfs_setxattr(d, kname, kvalue, size, flags); >> >> @@ -450,6 +454,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value, >> >> >> >> error = vfs_getxattr(d, kname, kvalue, size); >> >> if (error > 0) { >> >> + if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || >> >> + (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) >> >> + posix_acl_fix_xattr_to_user(kvalue, size); >> >> if (size && copy_to_user(value, kvalue, error)) >> >> error = -EFAULT; >> >> } else if (error == -ERANGE && size >= XATTR_SIZE_MAX) { >> >> diff --git a/fs/xattr_acl.c b/fs/xattr_acl.c >> >> index 69d06b0..bf472ca 100644 >> >> --- a/fs/xattr_acl.c >> >> +++ b/fs/xattr_acl.c >> >> @@ -9,7 +9,65 @@ >> >> #include <linux/fs.h> >> >> #include <linux/posix_acl_xattr.h> >> >> #include <linux/gfp.h> >> >> +#include <linux/user_namespace.h> >> >> >> >> +/* >> >> + * Fix up the uids and gids in posix acl extended attributes in place. >> >> + */ >> >> +static void posix_acl_fix_xattr_userns( >> >> + struct user_namespace *to, struct user_namespace *from, >> >> + void *value, size_t size) >> >> +{ >> >> + posix_acl_xattr_header *header = (posix_acl_xattr_header *)value; >> >> + posix_acl_xattr_entry *entry = (posix_acl_xattr_entry *)(header+1), *end; >> >> + int count; >> >> + kuid_t uid; >> >> + kgid_t gid; >> >> + >> >> + if (!value) >> >> + return; >> >> + if (size < sizeof(posix_acl_xattr_header)) >> >> + return; >> >> + if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION)) >> >> + return; >> >> + >> >> + count = posix_acl_xattr_count(size); >> >> + if (count < 0) >> >> + return; >> >> + if (count == 0) >> >> + return; >> >> + >> >> + for (end = entry + count; entry != end; entry++) { >> >> + switch(le16_to_cpu(entry->e_tag)) { >> >> + case ACL_USER: >> >> + uid = make_kuid(from, le32_to_cpu(entry->e_id)); >> >> >> > This should have some error checking I guess... The initial checks done >> > in posix_acl_from_xattr() are for init_user_ns (why?) and only duplicated >> > in posix_acl_valid(). >> >> The flow from userspace: >> posix_acl_fix_xattr_from_user >> posix_acl_from_xattr >> posix_acl_valid >> >> The flow to userspace: >> posix_acl_to_xattr >> posix_acl_fix_xattr_to_user >> >> The existence of the posix_acl_fix_xattr_from_user and >> posix_fix_xattr_to_user ensure that filesystems only see xattrs encoded >> in the initial user namespace. Which is why posix_acl_from_xattr only >> takes init_user_ns as a parameter. >> >> How filesystems handle xattrs that deal with acls is spread all across >> the map. Some filesystems do the reasonable thing and translate the >> xattr from userspace into an acl and then translate the acl into their >> on-disk format. Other filesystems just stuff the acl onto the disk or >> onto the fileserver without looking at it. >> >> As for checks my interpretation was that a filesystem should already >> be calling posix_acl_from_xattr and posix_acl_valid, and that >> duplicating those checks in posix_acl_fix_xattr_to/from_user would >> be redundnant and confusing. >> >> What does happen is that any uid or gid that does not map gets >> translated into -1, which should always fail the latter sanity >> check. > Ah, I got lost in the maze of xattr callbacks. You are right, things work > as you say. Thanks for explanation. No problem. I realized after writing this there is another interesting bit of explanation I left out. In the user -> kernel direction posix_acl_fix_xattr_from_user will not introduce any new failure modes because the destination is the init_user_ns. In the the kernel -> user direction (reads) posix_acl_fix_xattr_to_user will introduce new failure modes (the uids and gids that don't map and are replaced by -1). Strategically returning a defined uid that indicates the mapping didn't happen seems preferable to completely failing the system call. So since posix_acl_fix_xattr_from_user and posix_acl_fix_xattr_to_user don't introduce any new failure modes I am comfortable with them not returning error codes. >> >> + entry->e_id = cpu_to_le32(from_kuid(to, uid)); >> >> + break; >> >> + case ACL_GROUP: >> >> + gid = make_kgid(from, le32_to_cpu(entry->e_id)); >> >> + entry->e_id = cpu_to_le32(from_kuid(to, uid)); >> > here should be gid ^^^ >> Ugh. Yes. That is a very real bug. :( The &init_user_ns short circuit >> likely protects against regressions but I will fix this. > Actually, it will cause a regression because you will end up converting > uninitialized 'uid' variable. I mean the check for init_user_ns in posix_acl_fix_xattr_from_user and posix_acl_fix_xattr_to_user that causes this code to not be executed. Since the code is not executed except when you mix user namespaces we don't have the potential for regressions. The code is most definitely wrong if it gets executed. I have the obvious fix queued up in my for-next branch and I intend to ask Linus to pull it in a little bit. >> > Also what about the following scenario: >> > >> > We have namespace A with user U1 and namespace B which does not have a >> > valid representation for U1. >> >> > There is a file F which can be seen from both >> > namespaces. In namespace A we create acl for user U1 attached to F. Now in >> > namespace B we modify the acl via setfacl(1) command. What it does is >> > getxattr(2) - returns mangled acl because U1 has no representation in B. We >> > add something to xattr and call setxattr(2) - results in removing the >> > original acl for U1 and instead adding acl for uid -1. That is a security >> > bug I'd say. >> >> What will happen in most reasonable filesystems is that >> posix_acl_from_xattr or posix_acl_valid will see the -1 for the unmapped >> uid or gid. Realize that the -1 does not map, and return -EINVAL before >> setting the xattr. So I do not think the failure mode you are worried >> about can happen. > Hum, so for filesystems as ext4 or xfs you won't be able to modify acls > from namespace B on F. I guess this is a modest option (although I can > imagine users will ponder hard to find out while setfacl fails without > apparent reason for some files) until someone cares enough to implement > something more clever. I expect the -1 when they read back an acl will be a reasonable clue. In practice cross namespace accesses don't happen often so I don't expect it will be much of an issue. But yeah I understand the possible conflusion. > But filesystems such as ubifs or nfs4 will just > silently corrupt the acl. I don't think that is acceptable... I think you > should fix these to fail setting the acl or fail compilation with > CONFIG_USER_NS or whatever. Anything is better than corrupting on disk > data. There is an interesting twist on your concern. For a filesystem to implement posix acls the filesystem must call posix_acl_from_xattr when the acl is set. So only a filesystem that sets the acl and then calls posix_acl_from_xattr is susceptible to the failure mode you describe. ubifs is a weird case because ubifs has implemented general xattr support but ubifs does not implement posix acls. You can read and write posix acls on ubifs but posix acls on ubifs are ignored by permission checks. nfs4 still does not compile with CONFIG_USER_NS. I have patches in my development tree but those patches were just a touch short of being ready by the merge window. nfsd-4 acls map themselves down to posix acls, and go through the customary posix_acl_from_xattr path. nfs4 client support is interesting. nfs4 client support does not support posix acls but instead passes the acl to the nfs server for setting and validation. So there is no chance of corruption there. Although it could get weird if the uid to username mappings are not consistent. But that has little to do with user namespaces. So in net I don't think nfs4 will have problems. The rest of the network filesystems are in a similar boat to nfs4. They have not been merged yet they were the trickest and they still need a bit more review before they can be merged. My plan is to send them out for review after 3.7-rc1 is released and stage them for 3.8. Now filesystems like ubifs and fuse are interesting. Looking at those filesystems raises the question: How should we handle filesystems that don't implement posix acls but accept xattrs with the name of posix acls? My take: "Don't do that". Filesystems doing that are that silly just need to be fixed. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers