On Thu, Oct 20, 2022, at 1:44 AM, Christian Brauner wrote: > On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote: >> Hi Christian, >> >> On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote: >> > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: >> >> Hi, >> >> >> >> (Going off get_maintainers.pl for fs/namei.c here) >> >> >> >> I'm seeing some weird interactions with file capabilities and S_IRUSR >> >> procfs files. Best I can tell it doesn't occur with real files on my btrfs >> >> home partition. >> >> >> >> Test program: >> >> >> >> #include <fcntl.h> >> >> #include <stdio.h> >> >> >> >> int main() >> >> { >> >> int fd = open("/proc/self/auxv", O_RDONLY); >> >> if (fd < 0) { >> >> perror("open"); >> >> return 1; >> >> } >> >> >> >> printf("ok\n"); >> >> return 0; >> >> } >> >> >> >> Steps to reproduce: >> >> >> >> $ gcc main.c >> >> $ ./a.out >> >> ok >> >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out >> >> $ ./a.out >> >> open: Permission denied >> >> >> >> It's not obvious why this happens, even after spending a few hours >> >> going through the standard documentation and kernel code. It's >> >> intuitively odd b/c you'd think adding capabilities to the permitted >> >> set wouldn't affect functionality. >> >> >> >> Best I could tell the -EACCES error occurs in the fallthrough codepath >> >> inside generic_permission(). >> >> >> >> Sorry if this is something dumb or obvious. >> > >> > Hey Daniel, >> > >> > No, this is neither dumb nor obvious. :) >> > >> > Basically, if you set fscaps then /proc/self/auxv will be owned by >> > root:root. You can verify this: >> > >> > #include <fcntl.h> >> > #include <sys/types.h> >> > #include <sys/stat.h> >> > #include <stdio.h> >> > #include <errno.h> >> > #include <unistd.h> >> > >> > int main() >> > { >> > struct stat st; >> > printf("%d | %d\n", getuid(), geteuid()); >> > >> > if (stat("/proc/self/auxv", &st)) { >> > fprintf(stderr, "stat: %d - %m\n", errno); >> > return 1; >> > } >> > printf("stat: %d | %d\n", st.st_uid, st.st_gid); >> > >> > int fd = open("/proc/self/auxv", O_RDONLY); >> > if (fd < 0) { >> > fprintf(stderr, "open: %d - %m\n", errno); >> > return 1; >> > } >> > >> > printf("ok\n"); >> > return 0; >> > } >> > >> > $ ./a.out >> > 1000 | 1000 >> > stat: 1000 | 1000 >> > ok >> > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out >> > $ ./a.out >> > 1000 | 1000 >> > stat: 0 | 0 >> > open: 13 - Permission denied >> > >> > So acl_permission_check() fails and returns -EACCESS which will cause >> > generic_permission() to rely on capable_wrt_inode_uidgid() which checks >> > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. >> >> Thanks for checking on this. >> >> That does explain explain the weirdness but at the expense of another >> question: why do fscaps cause /proc/self/auxv to be owned by root? >> Is that the correct semantics? This also seems rather unexpected. >> >> I'll take a look tonight and see if I can come up with any answers. > > Sorry I didn't explain this in more detail. > You mostly uncovered the reasons as evidenced by the Twitter thread. > > Yes, this is expected. When a new process that gains privileges during > exec the kernel will make it non-dumpable. That includes changing of the > e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that > results in changed e{g,u}id, or if the executed binary has fscaps set if > the new permitted caps aren't a subset of the currently permitted caps. > > The last reason is what causes your sample program's /proc/self to be > owned by root. The culprit here is cred_cap_issubset() which is called > during commit_creds() in begin_new_exec(). > > If the dumpable attribute is set then all files in /proc/<pid> will be > owned by (userns) root. To get the full picture you'd need to at least > read man proc(5), man execve(2), and man prctl(2). > > The reason behind the dumpability change is to prevent unprivileged user > to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to > produce (userns-)root-owned coredumps which can be used in exploits. A > fairly recent example of this is e.g., > https://alephsecurity.com/2021/10/20/sudump/ > https://www.openwall.com/lists/oss-security/2021/10/20/2 Thanks for the detailed explanation! I think each sense makes sense to me now. Even if the final result is a little odd. One of those things I guess :). I'll see if a patch to the man-pages is appropriate. Thanks, Daniel