On Mon, 27 Sept 2021 at 16:27, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > On Tue, 21 Sept 2021 at 18:51, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > Hi Dmitry, > > > > The good news is that the bad unwind is a known issue, the bad news is > > that we don't currently have a way to fix it (and I'm planning to talk > > about this at the LPC "objtool on arm64" talk this Friday). > > > > More info below: the gist is we can produce spurious entries at an > > exception boundary, but shouldn't miss a legitimate value, and there's a > > plan to make it easier to spot when entries are not legitimate. > > > > On Fri, Sep 17, 2021 at 05:03:48PM +0200, Dmitry Vyukov wrote: > > > > Call trace: > > > > dump_backtrace+0x0/0x1ac arch/arm64/kernel/stacktrace.c:76 > > > > show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:215 > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > dump_stack_lvl+0x68/0x84 lib/dump_stack.c:105 > > > > print_address_description+0x7c/0x2b4 mm/kasan/report.c:256 > > > > __kasan_report mm/kasan/report.c:442 [inline] > > > > kasan_report+0x134/0x380 mm/kasan/report.c:459 > > > > __do_kernel_fault+0x128/0x1bc arch/arm64/mm/fault.c:317 > > > > do_bad_area arch/arm64/mm/fault.c:466 [inline] > > > > do_tag_check_fault+0x74/0x90 arch/arm64/mm/fault.c:737 > > > > do_mem_abort+0x44/0xb4 arch/arm64/mm/fault.c:813 > > > > el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:357 > > > > el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:408 > > > > el1h_64_sync+0x78/0x7c arch/arm64/kernel/entry.S:567 > > > > __entry_tramp_text_end+0xdfc/0x3000 > > > > > > /\/\/\/\/\/\/\ > > > > > > This is broken unwind on arm64. d_lookup statically calls __d_lookup, > > > not __entry_tramp_text_end (which is not even a function). > > > See the following thread for some debugging details: > > > https://lore.kernel.org/lkml/CACT4Y+ZByJ71QfYHTByWaeCqZFxYfp8W8oyrK0baNaSJMDzoUw@xxxxxxxxxxxxxx/ > > > > The problem here is that our calling convention (AAPCS64) only allows us > > to reliably unwind at function call boundaries, where the state of both > > the Link Register (LR/x30) and Frame Pointer (FP/x29) are well-defined. > > Within a function, we don't know whether to start unwinding from the LR > > or FP, and we currently start from the LR, which can produce spurious > > entries (but ensures we don't miss anything legitimte). > > > > In the short term, I have a plan to make the unwinder indicate when an > > entry might not be legitimate, with the usual stackdump code printing an > > indicator like '?' on x86. > > > > In the longer term, we might be doing things with objtool or asking for > > some toolchain help such that we can do better in these cases. > > Hi Mark, > > Any updates after the LPC session? > > If the dumper adds " ? ", then syzkaller will strip these frames > (required for x86). > However, I am worried that we can remove the true top frame then and > attribute crashes to wrong frames again? > > Some naive questions: > 1. Shouldn't the top frame for synchronous faults be in the PC/IP > register (I would assume LR/FP contains the caller of the current > frame)? > 2. How __entry_tramp_text_end, which is not a function, even ended up > in LR? shouldn't it always contain some code pointer (even if stale)? > 3. Isn't there already something in the debug info to solve this > problem? Userspace programs don't use objtool, but I assume that can > print crash stacks somehow (?). +Will, Serban, This ARM64 unwinder issue also means that all kernel MTE reports will contain wrong top frame, right?