On Sat, Feb 20, 2016 at 6:10 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Sat, Feb 20, 2016 at 02:25:40PM +0100, Mickaël Salaün wrote: > >> I think the bug may be somewhere in the nd->depth handling (when its value is 0) in fs/namei.c:get_link(): struct saved *last = nd->stack + nd->depth - 1 > > Getting there with nd->depth == 0 would certainly be a bug - it would mean > that we got there without should_follow_link() having returned 1. > > In case of open() it would be "do_last() has returned positive without > should_follow_link() having returned 1". > > <looks> > > OK, there are several places where we rely on not getting bogus return values > - inode_permission() should not return positives, neither should vfs_open(), > security_path_truncate() and notify_change(). > > Other similar "handle the last component" functions are guaranteed to > never return positives other than directly from should_follow_link(), so > they are OK. > > IIRC, you used LSM to inject a positive value to inode_permission(), right? > > Another way to trigger that would've been ->open() returning positive - > a bug on *anything* since ->open() had been introduced in 0.95. Amount of > harm would vary - e.g. 0.95 would simply have that positive number returned > to userland, looking like successful open(2). With no new descriptor, of > course... > > Short-term we probably want just > if (unlikely(error > 0)) { > WARN_ON(1); > error = -EINVAL; > } > added right after out: in do_last(), try to trigger Dmitry's reproducers > on it and then work back to the source of that thing *if* that's what's > happening in his case. Yours almost certainly is just that. > > Longer-term... I'm not sure. Having a method that is supposed to return 0 > or -E<something> actually return positive is going to be a bad thing, no > matter what, but "that bogus value gets passed to userland" is a lot > more tolerable than "kernel memory corruption". do_last() calling conventions > make it vulnerable to the latter, and as far as nd->stack underruns that's > it, but I'm not sure we don't have other places where such bug in driver, > etc. would translate into mess ;-/ > > OK, in any case, let's start with checking if Dmitry is seeing that and not > something else. I still don't understand his stack traces - the fault > address quoted in his first posting doesn't match the register values in > the same trace, and there's also a possibility that it's an RCU-related > crap. This should give a warning and prevent an oops if we are hitting > a stack underrun on bogus positive from do_last(). Dmitry, could you try > to build with delta below and run your reproducer(s)? > > diff --git a/fs/namei.c b/fs/namei.c > index f624d13..e30deef 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3273,6 +3273,10 @@ opened: > goto exit_fput; > } > out: > + if (unlikely(error > 0)) { > + WARN_ON(1); > + error = -EINVAL; > + } > if (got_write) > mnt_drop_write(nd->path.mnt); > path_put(&save_parent); I've reproduced the second report (the one originating in openat) with this patch and the WARNING did _not_ fire: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 2 PID: 17525 Comm: syz-executor Not tainted 4.5.0-rc5+ #331 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88002c6ddf00 ti: ffff88002c740000 task.ti: ffff88002c740000 RIP: 0010:[<ffffffff81821ded>] [<ffffffff81821ded>] atime_needs_update+0x2d/0x460 RSP: 0018:ffff88002c747a48 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88002c747d88 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000c RBP: ffff88002c747a70 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88002c747d98 R13: 0000000000000000 R14: ffff88002c747d98 R15: ffff88002c747d78 FS: 00007f24da3d9700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000002003f000 CR3: 000000002e2d0000 CR4: 00000000000006e0 Stack: ffff88002c747d40 ffff88002c747e08 0000000000000000 ffff88002c747d98 ffff88002c747d78 ffff88002c747ab8 ffffffff817eeda2 ffff88002c747d78 ffff880030fa88e8 ffff88002c747c98 0000000000000000 ffff88002c747d40 Call Trace: [< inline >] get_link fs/namei.c:1006 [<ffffffff817eeda2>] trailing_symlink+0x142/0x760 fs/namei.c:2094 [<ffffffff817f5cec>] path_openat+0xb4c/0x5760 fs/namei.c:3393 [<ffffffff817fe13e>] do_filp_open+0x18e/0x250 fs/namei.c:3425 [<ffffffff817c2dbc>] do_sys_open+0x1fc/0x420 fs/open.c:1022 [< inline >] SYSC_openat fs/open.c:1049 [<ffffffff817c3050>] SyS_openat+0x30/0x40 fs/open.c:1043 [<ffffffff8669f6b6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 Code: 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 98 17 d5 ff 48 8d 7b 0c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RIP [<ffffffff81821ded>] atime_needs_update+0x2d/0x460 fs/inode.c:1617 RSP <ffff88002c747a48> ---[ end trace 872348222bfe81b0 ]--- This is _not_ on a tmpfs mount. Regarding the registers, here is disassembly. The crash happens on a KASAN check of the ffffffff81821dc0 <atime_needs_update>: ffffffff81821dc0: 55 push %rbp ffffffff81821dc1: 48 89 e5 mov %rsp,%rbp ffffffff81821dc4: 41 57 push %r15 ffffffff81821dc6: 41 56 push %r14 ffffffff81821dc8: 41 55 push %r13 ffffffff81821dca: 41 54 push %r12 ffffffff81821dcc: 49 89 fc mov %rdi,%r12 ffffffff81821dcf: 53 push %rbx ffffffff81821dd0: 48 89 f3 mov %rsi,%rbx ffffffff81821dd3: e8 98 17 d5 ff callq ffffffff81573570 <__sanitizer_cov_trace_pc> ffffffff81821dd8: 48 8d 7b 0c lea 0xc(%rbx),%rdi ffffffff81821ddc: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax ffffffff81821de3: fc ff df ffffffff81821de6: 48 89 fa mov %rdi,%rdx ffffffff81821de9: 48 c1 ea 03 shr $0x3,%rdx ffffffff81821ded: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx ffffffff81821df1: 48 89 f8 mov %rdi,%rax ffffffff81821df4: 83 e0 07 and $0x7,%eax ffffffff81821df7: 83 c0 03 add $0x3,%eax ffffffff81821dfa: 38 d0 cmp %dl,%al ffffffff81821dfc: 7c 08 jl ffffffff81821e06 <atime_needs_update+0x46> ffffffff81821dfe: 84 d2 test %dl,%dl ffffffff81821e00: 0f 85 03 03 00 00 jne ffffffff81822109 <atime_needs_update+0x349> ffffffff81821e06: f6 43 0c 02 testb $0x2,0xc(%rbx) ffffffff81821e0a: 0f 85 1a 02 00 00 jne ffffffff8182202a <atime_needs_update+0x26a> ffffffff81821e10: e8 5b 17 d5 ff callq ffffffff81573570 <__sanitizer_cov_trace_pc> ffffffff81821e15: 48 8d 7b 28 lea 0x28(%rbx),%rdi ffffffff81821e19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax ffffffff81821e20: fc ff df ffffffff81821e23: 48 89 fa mov %rdi,%rdx ffffffff81821e26: 48 c1 ea 03 shr $0x3,%rdx ffffffff81821e2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) ffffffff81821e2e: 0f 85 c4 03 00 00 jne ffffffff818221f8 <atime_needs_update+0x438> It means that inode is NULL here: bool atime_needs_update(const struct path *path, struct inode *inode) { struct vfsmount *mnt = path->mnt; struct timespec now; if (inode->i_flags & S_NOATIME) return false; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html