On 2019-12-30, Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > On 2019-12-30, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 30, 2019 at 04:20:35PM +1100, Aleksa Sarai wrote: > > > > > A reasonably detailed explanation of the issues is provided in the patch > > > itself, but the full traces produced by both the oopses and deadlocks is > > > included below (it makes little sense to include them in the commit since we > > > are disabling this feature, not directly fixing the bugs themselves). > > > > > > I've posted this as an RFC on whether this feature should be allowed at > > > all (and if anyone knows of legitimate uses for it), or if we should > > > work on fixing these other kernel bugs that it exposes. > > > > Umm... Are all of those traces > > a) reproducible on mainline and > > This was on viro/for-next, I'll retry it on v5.5-rc4. The NULL deref oops is reproducible on v5.5-rc4. Strangely it seems harder to reproduce than on viro/for-next (I kept reproducing it there by accident), but I'll double-check if that really is the case. The simplest reproducer is (using the attached programs and .config): ln -s . link sudo ./umount_symlink link There's also a few other whacky behaviours where you get -ELOOP or -EACCES in cases where you shouldn't -- which results in MNT_DETACH failing and the mount being impossible to get rid of. A good example is sudo ./mount_to_symlink /proc/self/exe link sudo ./umount_symlink link # -EACCES Or ln -s . link1 ln -s . link2 sudo ./mount_to_symlink link1 link2 sudo ./umount_symlink link1 # -ELOOP sudo ./umount_symlink link2 # -ELOOP But I am trying to find a reproducer for the "umount of a mount triggering an Oops" issue. On another note -- I guess this is considered a feature which should "just work" and not a bug? BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 80000003c6fca067 P4D 80000003c6fca067 PUD 3c6f42067 PMD 0 Oops: 0010 [#1] SMP PTI CPU: 4 PID: 4486 Comm: umount_symlink Tainted: G E 5.5.0-rc4-cyphar #126 Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET55W (1.30 ) 08/31/2018 RIP: 0010:0x0 Code: Bad RIP value. RSP: 0018:ffffb70b82963cc0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff906d0cc3bb40 RCX: 0000000000000abc RDX: 0000000000000089 RSI: ffff906d74623cc0 RDI: ffff906d74475df0 RBP: ffff906d74475df0 R08: ffffd70b7fb24c20 R09: ffff906d066a5000 R10: 0000000000000000 R11: 8080807fffffffff R12: ffff906d74623cc0 R13: 0000000000000089 R14: ffffb70b82963dc0 R15: 0000000000000080 FS: 00007fbc2a8f0540(0000) GS:ffff906dcf500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003c68f8001 CR4: 00000000003606e0 Call Trace: __lookup_slow+0x94/0x160 lookup_slow+0x36/0x50 path_mountpoint+0x1be/0x360 filename_mountpoint+0xa5/0x150 ? __lookup_hash+0xa0/0xa0 ksys_umount+0x78/0x490 __x64_sys_umount+0x12/0x20 do_syscall_64+0x64/0x240 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fbc2a8274e7 Code: 09 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 09 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd1da9b3f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc2a8274e7 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000001300310 RBP: 00007ffd1da9b4c0 R08: 0000000000000000 R09: 000000000000000f R10: 00007fbc2a92f800 R11: 0000000000000202 R12: 0000000000401090 R13: 00007ffd1da9b5a0 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: [snip] CR2: 0000000000000000 ---[ end trace ae473813e34e641d ]--- RIP: 0010:0x0 Code: Bad RIP value. RSP: 0018:ffffb70b82963cc0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff906d0cc3bb40 RCX: 0000000000000abc RDX: 0000000000000089 RSI: ffff906d74623cc0 RDI: ffff906d74475df0 RBP: ffff906d74475df0 R08: ffffd70b7fb24c20 R09: ffff906d066a5000 R10: 0000000000000000 R11: 8080807fffffffff R12: ffff906d74623cc0 R13: 0000000000000089 R14: ffffb70b82963dc0 R15: 0000000000000080 FS: 00007fbc2a8f0540(0000) GS:ffff906dcf500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003c68f8001 CR4: 00000000003606e0 -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
Attachment:
.config
Description: application/config
#define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #define bail(msg) \ do { printf("mount_to_symlink: %s: %m\n", msg); exit(1); } while (0) int is_symlink(const char *path) { struct stat stat = {}; if (lstat(path, &stat) < 0) bail("lstat(<path>)"); return S_ISLNK(stat.st_mode); } int main(int argc, char **argv) { struct stat stat = {}; char *src, *dst, *src_fdpath, *dst_fdpath; int src_fd, dst_fd; if (argc != 3) bail("usage: mount_to_symlink <src> <dst>"); src_fdpath = src = argv[1]; dst_fdpath = dst = argv[2]; if (is_symlink(src)) { // open source fd src_fd = open(src, O_PATH | O_CLOEXEC | O_NOFOLLOW); if (src_fd < 0) bail("open(<src>, O_PATH|O_NOFOLLOW)"); // construct fd path asprintf(&src_fdpath, "/proc/self/fd/%d", src_fd); } if (is_symlink(dst)) { // open target fd dst_fd = open(dst, O_PATH | O_CLOEXEC | O_NOFOLLOW); if (dst_fd < 0) bail("open(<dst>, O_PATH|O_NOFOLLOW)"); // construct fd path asprintf(&dst_fdpath, "/proc/self/fd/%d", dst_fd); } // try to mount mount(src_fdpath, dst_fdpath, "", MS_BIND, ""); printf("mount(%s, %s, MS_BIND) = %m (%d)\n", src, dst, -errno); return 0; }
#define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #define bail(msg) \ do { printf("mount_to_symlink: %s: %m\n", msg); exit(1); } while (0) int main(int argc, char **argv) { struct stat stat = {}; char *mnt, *mnt_fdpath; int mnt_fd; if (argc != 2) bail("need <mount> argument"); mnt = argv[1]; // open mountpoint fd mnt_fd = open(mnt, O_PATH | O_CLOEXEC | O_NOFOLLOW); if (mnt_fd < 0) bail("open(<dst>, O_PATH|O_NOFOLLOW)"); // get fdpaths asprintf(&mnt_fdpath, "/proc/self/fd/%d", mnt_fd); // try to mount umount2(mnt_fdpath, MNT_DETACH); printf("umount2(%s, MNT_DETACH) = %m (%d)\n", mnt, -errno); return 0; }
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers