Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 28 Feb 2019 13:56:51 +0100 (CET)

On Thu, 28 Feb 2019, Jann Horn wrote:
> +Josh for unwinding, +x86 folks
> On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot <syzbot+ca95b2b7aef9e7cbd6ab@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    4aa9fc2a435a Revert "mm, memory_hotplug: initialize struct..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f400000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > Not understanding.  That seems to be saying that we got a NULL pointer
> > deref in __generic_file_write_iter() at
> >
> >                 written = generic_perform_write(file, from, iocb->ki_pos);
> >
> > which isn't possible.
> >
> > I'm not seeing recent changes in there which could have caused this.  Help.
> 
> +
> 
> Maybe the problem is that the frame pointer unwinder isn't designed to
> cope with NULL function pointers - or more generally, with an
> unwinding operation that starts before the function's frame pointer
> has been set up?
> 
> Unwinding starts at show_trace_log_lvl(). That begins with
> unwind_start(), which calls __unwind_start(), which uses
> get_frame_pointer(), which just returns regs->bp. But that frame
> pointer points to the part of the stack that's storing the address of
> the caller of the function that called NULL; the caller of NULL is
> skipped, as far as I can tell.
> 
> What's kind of annoying here is that we don't have a proper frame set
> up yet, we only have half a stack frame (saved RIP but no saved RBP).

That wreckage is related to the fact that the indirect calls are going
through __x86_indirect_thunk_$REG. I just verified on a VM with some other
callback NULL'ed that the resulting backtrace is not really helpful.

So in that case generic_perform_write() has two indirect calls:

  mapping->a_ops->write_begin() and ->write_end()

Thanks,

	tglx