On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote: > Hi Christian, > > On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote: > > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: > >> Hi, > >> > >> (Going off get_maintainers.pl for fs/namei.c here) > >> > >> I'm seeing some weird interactions with file capabilities and S_IRUSR > >> procfs files. Best I can tell it doesn't occur with real files on my btrfs > >> home partition. > >> > >> Test program: > >> > >> #include <fcntl.h> > >> #include <stdio.h> > >> > >> int main() > >> { > >> int fd = open("/proc/self/auxv", O_RDONLY); > >> if (fd < 0) { > >> perror("open"); > >> return 1; > >> } > >> > >> printf("ok\n"); > >> return 0; > >> } > >> > >> Steps to reproduce: > >> > >> $ gcc main.c > >> $ ./a.out > >> ok > >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > >> $ ./a.out > >> open: Permission denied > >> > >> It's not obvious why this happens, even after spending a few hours > >> going through the standard documentation and kernel code. It's > >> intuitively odd b/c you'd think adding capabilities to the permitted > >> set wouldn't affect functionality. > >> > >> Best I could tell the -EACCES error occurs in the fallthrough codepath > >> inside generic_permission(). > >> > >> Sorry if this is something dumb or obvious. > > > > Hey Daniel, > > > > No, this is neither dumb nor obvious. :) > > > > Basically, if you set fscaps then /proc/self/auxv will be owned by > > root:root. You can verify this: > > > > #include <fcntl.h> > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <stdio.h> > > #include <errno.h> > > #include <unistd.h> > > > > int main() > > { > > struct stat st; > > printf("%d | %d\n", getuid(), geteuid()); > > > > if (stat("/proc/self/auxv", &st)) { > > fprintf(stderr, "stat: %d - %m\n", errno); > > return 1; > > } > > printf("stat: %d | %d\n", st.st_uid, st.st_gid); > > > > int fd = open("/proc/self/auxv", O_RDONLY); > > if (fd < 0) { > > fprintf(stderr, "open: %d - %m\n", errno); > > return 1; > > } > > > > printf("ok\n"); > > return 0; > > } > > > > $ ./a.out > > 1000 | 1000 > > stat: 1000 | 1000 > > ok > > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > > $ ./a.out > > 1000 | 1000 > > stat: 0 | 0 > > open: 13 - Permission denied > > > > So acl_permission_check() fails and returns -EACCESS which will cause > > generic_permission() to rely on capable_wrt_inode_uidgid() which checks > > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. > > Thanks for checking on this. > > That does explain explain the weirdness but at the expense of another > question: why do fscaps cause /proc/self/auxv to be owned by root? > Is that the correct semantics? This also seems rather unexpected. > > I'll take a look tonight and see if I can come up with any answers. Sorry I didn't explain this in more detail. You mostly uncovered the reasons as evidenced by the Twitter thread. Yes, this is expected. When a new process that gains privileges during exec the kernel will make it non-dumpable. That includes changing of the e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that results in changed e{g,u}id, or if the executed binary has fscaps set if the new permitted caps aren't a subset of the currently permitted caps. The last reason is what causes your sample program's /proc/self to be owned by root. The culprit here is cred_cap_issubset() which is called during commit_creds() in begin_new_exec(). If the dumpable attribute is set then all files in /proc/<pid> will be owned by (userns) root. To get the full picture you'd need to at least read man proc(5), man execve(2), and man prctl(2). The reason behind the dumpability change is to prevent unprivileged user to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to produce (userns-)root-owned coredumps which can be used in exploits. A fairly recent example of this is e.g., https://alephsecurity.com/2021/10/20/sudump/ https://www.openwall.com/lists/oss-security/2021/10/20/2