Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

Jann Horn <jannh@xxxxxxxxxx> · Thu, 28 Feb 2019 17:34:16 +0100

On Thu, Feb 28, 2019 at 1:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, 28 Feb 2019, Jann Horn wrote:
> > +Josh for unwinding, +x86 folks
> > On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
> > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot <syzbot+ca95b2b7aef9e7cbd6ab@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    4aa9fc2a435a Revert "mm, memory_hotplug: initialize struct..
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f400000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > > >
> > > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > Not understanding.  That seems to be saying that we got a NULL pointer
> > > deref in __generic_file_write_iter() at
> > >
> > >                 written = generic_perform_write(file, from, iocb->ki_pos);
> > >
> > > which isn't possible.
> > >
> > > I'm not seeing recent changes in there which could have caused this.  Help.
> >
> > +
> >
> > Maybe the problem is that the frame pointer unwinder isn't designed to
> > cope with NULL function pointers - or more generally, with an
> > unwinding operation that starts before the function's frame pointer
> > has been set up?
> >
> > Unwinding starts at show_trace_log_lvl(). That begins with
> > unwind_start(), which calls __unwind_start(), which uses
> > get_frame_pointer(), which just returns regs->bp. But that frame
> > pointer points to the part of the stack that's storing the address of
> > the caller of the function that called NULL; the caller of NULL is
> > skipped, as far as I can tell.
> >
> > What's kind of annoying here is that we don't have a proper frame set
> > up yet, we only have half a stack frame (saved RIP but no saved RBP).
>
> That wreckage is related to the fact that the indirect calls are going
> through __x86_indirect_thunk_$REG. I just verified on a VM with some other
> callback NULL'ed that the resulting backtrace is not really helpful.
>
> So in that case generic_perform_write() has two indirect calls:
>
>   mapping->a_ops->write_begin() and ->write_end()

Does the indirect thunk thing really make any difference? When you
arrive at RIP=NULL, RSP points to a saved instruction pointer, just
like when indirect calls are compiled normally.

I just compiled kernels with artificial calls to a NULL function
pointer (in prctl_set_seccomp()), with retpoline disabled, with both
unwinders. The ORC unwinder shows a call trace with "?" everywhere
that doesn't show the caller:

[  228.219140] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000000
[  228.223897] #PF error: [INSTR]
[  228.224562] PGD 0 P4D 0
[  228.225119] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[  228.226319] CPU: 1 PID: 1099 Comm: artificial_null Not tainted
5.0.0-rc8+ #299
[  228.227818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  228.229542] RIP: 0010:          (null)
[  228.230331] Code: Bad RIP value.
[  228.231011] RSP: 0018:ffff8881d798fe88 EFLAGS: 00010246
[  228.232104] RAX: ffffffffffffffda RBX: 0000000000000016 RCX: ffffffffa0368205
[  228.233599] RDX: dffffc0000000000 RSI: 00007ffde0d71168 RDI: 0000000000000042
[  228.235077] RBP: 1ffff1103af31fd4 R08: 0000561b50807740 R09: 0000000000000016
[  228.236557] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000042
[  228.238039] R13: 0000000000000000 R14: 00007ffde0d71168 R15: 0000561b50807740
[  228.239517] FS:  00007fe31f1cf700(0000) GS:ffff8881eb040000(0000)
knlGS:0000000000000000
[  228.241213] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  228.242411] CR2: ffffffffffffffd6 CR3: 00000001df8b8004 CR4: 0000000000360ee0
[  228.243886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  228.245364] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  228.246841] Call Trace:
[  228.247366]  ? __x64_sys_prctl+0x402/0x680
[  228.248224]  ? __ia32_sys_prctl+0x6e0/0x6e0
[  228.249106]  ? __do_page_fault+0x457/0x620
[  228.249969]  ? do_syscall_64+0x6d/0x160
[  228.250778]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[...]

whereas the FP unwinder shows this, listing prctl_set_seccomp only
with a question mark:

[   47.469957] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000000
[   47.476973] #PF error: [INSTR]
[   47.477742] PGD 0 P4D 0
[   47.478341] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   47.479703] CPU: 4 PID: 1079 Comm: artificial_null Not tainted
5.0.0-rc8+ #300
[   47.481489] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[   47.483442] RIP: 0010:          (null)
[   47.484328] Code: Bad RIP value.
[   47.485085] RSP: 0018:ffff8881e01f7e70 EFLAGS: 00010246
[   47.486358] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: ffffffffafbf007a
[   47.488090] RDX: dffffc0000000000 RSI: 00007ffe164b4f28 RDI: 0000000000000042
[   47.489862] RBP: ffff8881e01f7e78 R08: 0000562942136740 R09: 0000000000000016
[   47.491491] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1103c03efd3
[   47.493144] R13: 0000000000000042 R14: 00007ffe164b4f28 R15: 0000000000000016
[   47.494795] FS:  00007fa38b1d6700(0000) GS:ffff8881eb300000(0000)
knlGS:0000000000000000
[   47.496638] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.497981] CR2: ffffffffffffffd6 CR3: 00000001e0e4c006 CR4: 0000000000360ee0
[   47.499623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.501252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.502927] Call Trace:
[   47.503501]  ? prctl_set_seccomp+0x3a/0x50
[   47.504450]  __x64_sys_prctl+0x457/0x6f0
[   47.505349]  ? __ia32_sys_prctl+0x750/0x750
[   47.506352]  do_syscall_64+0x72/0x160
[   47.507214]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[...]

Looking back at the syzkaller report, the command line output
(https://syzkaller.appspot.com/x/log.txt?x=1101382f400000) has this:

[  375.092788] Call Trace:
[  375.095378]  ? generic_perform_write+0x2a0/0x6b0
[  375.100150]  ? add_page_wait_queue+0x480/0x480
[  375.104744]  ? current_time+0x1b0/0x1b0
[  375.108727]  ? generic_write_check_limits+0x380/0x380
[  375.113942]  ? ext4_file_write_iter+0x28b/0x1410
{some non-dmesg output here}
[  375.118711]  __generic_file_write_iter+0x25e/0x630
[  375.123714]  ext4_file_write_iter+0x37a/0x1410

The first entry with a question mark is *RSP, the real caller; that's
generic_perform_write(), as expected. The rest is probably just random
garbage that happened to still be on the stack. It looks like
syzkaller strips out trace elements with question marks in front.

So I think this doesn't really have anything to do with
__x86_indirect_thunk_$REG, and the best possible fix might be to teach
the unwinders that RIP==NULL means "pretend that RIP is *real_RSP and
that RSP is real_RSP+8, and report *real_RSP as the first element of
the backtrace".