On Tue, Aug 06, 2024 at 06:02:26PM +0200, Christian Brauner wrote: > (Preface because I've been panick-approached by people at conference > when we discussed this before: overmounting any global procfs files > such as /proc/status remains unaffected and is an existing and > supported use-case.) > > It is currently possible to mount on top of various ephemeral entities > in procfs. This specifically includes magic links. To recap, magic links > are links of the form /proc/<pid>/fd/<nr>. They serve as references to > a target file and during path lookup they cause a jump to the target > path. Such magic links disappear if the corresponding file descriptor is > closed. > > Currently it is possible to overmount such magic links: > > int fd = open("/mnt/foo", O_RDONLY); > sprintf(path, "/proc/%d/fd/%d", getpid(), fd); > int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW); > mount("/mnt/bar", path, "", MS_BIND, 0); > > Arguably, this is nonsensical and is mostly interesting for an attacker > that wants to somehow trick a process into e.g., reopening something > that they didn't intend to reopen or to hide a malicious file > descriptor. > > But also it risks leaking mounts for long-running processes. When > overmounting a magic link like above, the mount will not be detached > when the file descriptor is closed. Only the target mountpoint will > disappear. Which has the consequence of making it impossible to unmount > that mount afterwards. So the mount will stick around until the process > exits and the /proc/<pid>/ directory is cleaned up during > proc_flush_pid() when the dentries are pruned and invalidated. > > That in turn means it's possible for a program to accidentally leak > mounts and it's also possible to make a task leak mounts without it's > knowledge if the attacker just keeps overmounting things under > /proc/<pid>/fd/<nr>. > > I think it's wrong to try and fix this by us starting to play games with > close() or somewhere else to undo these mounts when the file descriptor > is closed. The fact that we allow overmounting of such magic links is > simply a bug and one that we need to fix. > > Similar things can be said about entries under fdinfo/ and map_files/ so > those are restricted as well. > > I have a further more aggressive patch that gets out the big hammer and > makes everything under /proc/<pid>/*, as well as immediate symlinks such > as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point > into /proc/<pid>/ not overmountable. Imho, all of this should be blocked > if we can get away with it. It's only useful to hide exploits such as in [1]. > > And again, overmounting of any global procfs files remains unaffected > and is an existing and supported use-case. > > Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1] > > // Note that repro uses the traditional way of just mounting over > // /proc/<pid>/fd/<nr>. This could also all be achieved just based on > // file descriptors using move_mount(). So /proc/<pid>/fd/<nr> isn't the > // only entry vector here. It's also possible to e.g., mount directly > // onto /proc/<pid>/map_files/* without going over /proc/<pid>/fd/<nr>. > int main(int argc, char *argv[]) > { > char path[PATH_MAX]; > > creat("/mnt/foo", 0777); > creat("/mnt/bar", 0777); > > /* > * For illustration use a bunch of file descriptors in the upper > * range that are unused. > */ > for (int i = 10000; i >= 256; i--) { > printf("I'm: /proc/%d/\n", getpid()); > > int fd2 = open("/mnt/foo", O_RDONLY); > if (fd2 < 0) { > printf("%m - Failed to open\n"); > _exit(1); > } > > int newfd = dup2(fd2, i); > if (newfd < 0) { > printf("%m - Failed to dup\n"); > _exit(1); > } > close(fd2); > > sprintf(path, "/proc/%d/fd/%d", getpid(), newfd); > int fd = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW); > if (fd < 0) { > printf("%m - Failed to open\n"); > _exit(3); > } > > sprintf(path, "/proc/%d/fd/%d", getpid(), fd); > printf("Mounting on top of %s\n", path); > if (mount("/mnt/bar", path, "", MS_BIND, 0)) { > printf("%m - Failed to mount\n"); > _exit(4); > } > > close(newfd); > close(fd2); > } > > /* > * Give some time to look at things. The mounts now linger until > * the process exits. > */ > sleep(10000); > _exit(0); > } > > Co-developed-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> I'm always down to restrict /proc, you can add Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Thanks, Josef