Hi On Thu, Mar 13, 2014 at 1:37 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > We do, but how do you get anything even attempt deny_write_access() on > those? And what would the semantics of that be, anyway? I just discovered /proc/self/map_files/ (only enabled with CONFIG_CHECKPOINT_RESTORE). Attached is an example to trigger an i_writecount underrun on an anon-mapping with your recommended fix applied. You can also find it at [1]. This example is a bit more complex, but basically does this: - get a fresh, executable zero-mapping - write an executable into the mapping - get a read-only file-descriptor to the underlying shmem file via /proc/self/map_files/ - execute that mapping via /proc/self/map_files/ - try to get a writable FD via /proc/self/map_files, this will fail with ETXTBUSY _even though_ we still own a writable mapping Ok, maybe this example is stupid, uses non-standard functionality (CONFIG_CHECKPOINT_RESTORE) and is a very unlikely use-case, but it shows that there _are_ ways to trigger this underrun. The hard part is to get access to an executable file-descriptor that was created via alloc_file() rather than open(). Once you got this, you can always trigger the underrun by executing the file while you still have a _single_ writable mapping which wasn't accounted for in i_writecount. /proc/self/map_files/ is root only, so this is not a security problem. I'm fine if you argue /proc/self/map_files/ is insecure and shouldn't be used. Or maybe it must be non-executable. I'm just saying there might always be new interfaces that give you a file-descriptor to files allocated with alloc_file() rather than open(). And once you got this FD, you can always execve() it via /proc and get deny_write_access() this way. By initializing i_writecount on all files, we make sure this never happens. Anyhow, as I really want this fixed either way, please let me know how to proceed. I can polish your patch and resend it if you don't intend to apply it yourself. Thanks david [1] https://gist.githubusercontent.com/dvdhrm/9661331/raw/b751cc7859267bb62af393ba3817b92c5225a372/gistfile1.txt
#define _GNU_SOURCE #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char **argv) { char *args[64]; struct stat st; sigset_t set; char path[128]; void *map; size_t size; int sig, fd_exe, fd_map, fd; pid_t pid; ssize_t res; /* * If called with arguments, we just act as a sleeping process that * waits for any signal to arrive. */ if (argc > 1) { printf("dummy waiter init\n"); /* dummy waiter; SIGTERM terminates us anyway */ sigemptyset(&set); sigaddset(&set, SIGTERM); sigwait(&set, &sig); printf("dummy waiter exit\n"); exit(0); } fd_exe = open("/proc/self/exe", O_RDONLY); if (fd_exe < 0) { printf("open(/proc/self/exe) failed: %m\n"); abort(); } if (fstat(fd_exe, &st) < 0) { printf("fstat(fd_exe) failed: %m\n"); abort(); } /* page-align */ size = (st.st_size + 4095ULL) & ~4095ULL; map = mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (map == MAP_FAILED) { printf("mmap(MAP_ANON) failed: %m\n"); abort(); } res = read(fd_exe, map, st.st_size); if (res != st.st_size) { if (res < 0) printf("read(fd_exe) failed: %m\n"); else printf("read(fd_exe) was truncated to %zu\n", (size_t)res); abort(); } sprintf(path, "/proc/self/map_files/%lx-%lx", (unsigned long)map, (unsigned long)map + size); fd_map = open(path, O_RDONLY); if (fd_map < 0) { printf("open(%s) failed: %m\n", path); abort(); } pid = fork(); if (pid < 0) { printf("fork() failed: %m\n"); abort(); } else if (!pid) { args[0] = argv[0]; args[1] = "dummy"; args[2] = NULL; execve(path, args, NULL); printf("execve() failed: %m\n"); abort(); } /* sleep 100ms to make sure the execve() really worked */ usleep(100 * 1000ULL); sprintf(path, "/proc/self/fd/%d", fd_map); fd = open(path, O_RDWR); if (fd < 0) { printf("open(%s) failed: %m\n", path); abort(); } munmap(map, size); close(fd_exe); close(fd); close(fd_map); printf("exiting\n"); return 0; }