On September 30, 2018 3:54:31 PM GMT+02:00, Alban Crequy <alban@xxxxxxxxxx> wrote: >On Sat, Sep 29, 2018 at 12:35 PM Aleksa Sarai <cyphar@xxxxxxxxxx> >wrote: >> >> The need for some sort of control over VFS's path resolution (to >avoid >> malicious paths resulting in inadvertent breakouts) has been a very >> long-standing desire of many userspace applications. This patchset is >a >> revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few >additions. >> >> The most obvious change is that AT_NO_JUMPS has been split as >dicussed >> in the original thread, along with a further split of AT_NO_PROCLINKS >> which means that each individual property of AT_NO_JUMPS is now a >> separate flag: >> >> * Path-based escapes from the starting-point using "/" or ".." are >> blocked by AT_BENEATH. >> * Mountpoint crossings are blocked by AT_XDEV. >> * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more >> correctly it actually blocks any user of nd_jump_link() >because it >> allows out-of-VFS path resolution manipulation). >> >> AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). >At >> Linus' suggestion in the original thread, I've also implemented >> AT_NO_SYMLINKS which just denies _all_ symlink resolution (including >> "proclink" resolution). > >It seems quite useful to me. > >> An additional improvement was made to AT_XDEV. The original >AT_NO_JUMPS >> path didn't consider "/tmp/.." as a mountpoint crossing -- this patch >> blocks this as well (feel free to ask me to remove it if you feel >this >> is not sane). >> >> Currently I've only enabled these for openat(2) and the stat(2) >family. >> I would hope we could enable it for basically every *at(2) syscall -- >> but many of them appear to not have a @flags argument and thus we'll >> need to add several new syscalls to do this. I'm more than happy to >send >> those patches, but I'd prefer to know that this preliminary work is >> acceptable before doing a bunch of copy-paste to add new sets of >*at(2) >> syscalls. > >What do you think of an equivalent feature AT_NO_SYMLINKS flag for >mount()? That's something we discussed but that would need to be part of the new mount API work by David. The current mount API doesn't take AT_* flags since it doesn't operate on fds and we're (sort of) out of mount flags. > >I guess that would have made the fix for CVE-2017-1002101 in >Kubernetes easier to write: >https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > >> One additional feature I've implemented is AT_THIS_ROOT (I imagine >this >> is probably going to be more contentious than the refresh of >> AT_NO_JUMPS, so I've included it in a separate patch). The patch >itself >> describes my reasoning, but the shortened version of the premise is >that >> continer runtimes need to have a way to resolve paths within a >> potentially malicious rootfs. Container runtimes currently do this in >> userspace[2] which has implicit race conditions that are not >resolvable >> in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is >> inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics >for >> path resolution, which would be invaluable for us -- and the >> implementation is basically identical to AT_BENEATH (except that we >> don't return errors when someone actually hits the root). >> >> I've added some selftests for this, but it's not clear to me whether >> they should live here or in xfstests (as far as I can tell there are >no >> other VFS tests in selftests, while there are some tests that look >like >> generic VFS tests in xfstests). If you'd prefer them to be included >in >> xfstests, let me know. >> >> [1]: https://lore.kernel.org/patchwork/patch/784221/ >> [2]: https://github.com/cyphar/filepath-securejoin >> >> Aleksa Sarai (3): >> namei: implement O_BENEATH-style AT_* flags >> namei: implement AT_THIS_ROOT chroot-like path resolution >> selftests: vfs: add AT_* path resolution tests >> >> fs/fcntl.c | 2 +- >> fs/namei.c | 158 >++++++++++++------ >> fs/open.c | 10 ++ >> fs/stat.c | 15 +- >> include/linux/fcntl.h | 3 +- >> include/linux/namei.h | 8 + >> include/uapi/asm-generic/fcntl.h | 20 +++ >> include/uapi/linux/fcntl.h | 10 ++ >> tools/testing/selftests/Makefile | 1 + >> tools/testing/selftests/vfs/.gitignore | 1 + >> tools/testing/selftests/vfs/Makefile | 13 ++ >> tools/testing/selftests/vfs/at_flags.h | 40 +++++ >> tools/testing/selftests/vfs/common.sh | 37 ++++ >> .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ >> .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ >> .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ >> .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ >> .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ >> tools/testing/selftests/vfs/vfs_helper.c | 154 >+++++++++++++++++ >> 19 files changed, 707 insertions(+), 56 deletions(-) >> create mode 100644 tools/testing/selftests/vfs/.gitignore >> create mode 100644 tools/testing/selftests/vfs/Makefile >> create mode 100644 tools/testing/selftests/vfs/at_flags.h >> create mode 100644 tools/testing/selftests/vfs/common.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0001_at_beneath.sh >> create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0005_at_this_root.sh >> create mode 100644 tools/testing/selftests/vfs/vfs_helper.c >> >> -- >> 2.19.0