On Thu, Feb 28, 2019 at 5:34 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Thu, Feb 28, 2019 at 1:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Thu, 28 Feb 2019, Jann Horn wrote: > > > +Josh for unwinding, +x86 folks > > > On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton > > > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot <syzbot+ca95b2b7aef9e7cbd6ab@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > Hello, > > > > > > > > > > syzbot found the following crash on: > > > > > > > > > > HEAD commit: 4aa9fc2a435a Revert "mm, memory_hotplug: initialize struct.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f400000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab > > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > > > Not understanding. That seems to be saying that we got a NULL pointer > > > > deref in __generic_file_write_iter() at > > > > > > > > written = generic_perform_write(file, from, iocb->ki_pos); > > > > > > > > which isn't possible. > > > > > > > > I'm not seeing recent changes in there which could have caused this. Help. > > > > > > + > > > > > > Maybe the problem is that the frame pointer unwinder isn't designed to > > > cope with NULL function pointers - or more generally, with an > > > unwinding operation that starts before the function's frame pointer > > > has been set up? > > > > > > Unwinding starts at show_trace_log_lvl(). That begins with > > > unwind_start(), which calls __unwind_start(), which uses > > > get_frame_pointer(), which just returns regs->bp. But that frame > > > pointer points to the part of the stack that's storing the address of > > > the caller of the function that called NULL; the caller of NULL is > > > skipped, as far as I can tell. > > > > > > What's kind of annoying here is that we don't have a proper frame set > > > up yet, we only have half a stack frame (saved RIP but no saved RBP). > > > > That wreckage is related to the fact that the indirect calls are going > > through __x86_indirect_thunk_$REG. I just verified on a VM with some other > > callback NULL'ed that the resulting backtrace is not really helpful. > > > > So in that case generic_perform_write() has two indirect calls: > > > > mapping->a_ops->write_begin() and ->write_end() > > Does the indirect thunk thing really make any difference? When you > arrive at RIP=NULL, RSP points to a saved instruction pointer, just > like when indirect calls are compiled normally. > > I just compiled kernels with artificial calls to a NULL function > pointer (in prctl_set_seccomp()), with retpoline disabled, with both > unwinders. The ORC unwinder shows a call trace with "?" everywhere > that doesn't show the caller: [...] > So I think this doesn't really have anything to do with > __x86_indirect_thunk_$REG, and the best possible fix might be to teach > the unwinders that RIP==NULL means "pretend that RIP is *real_RSP and > that RSP is real_RSP+8, and report *real_RSP as the first element of > the backtrace". Cooking up some patches now...