On Mon, Sep 27, 2021 at 06:01:22PM +0100, Mark Rutland wrote: > On Mon, Sep 27, 2021 at 04:27:30PM +0200, Dmitry Vyukov wrote: > > On Tue, 21 Sept 2021 at 18:51, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > > > Hi Dmitry, > > > > > > The good news is that the bad unwind is a known issue, the bad news is > > > that we don't currently have a way to fix it (and I'm planning to talk > > > about this at the LPC "objtool on arm64" talk this Friday). > > > > > > More info below: the gist is we can produce spurious entries at an > > > exception boundary, but shouldn't miss a legitimate value, and there's a > > > plan to make it easier to spot when entries are not legitimate. > > > > > > On Fri, Sep 17, 2021 at 05:03:48PM +0200, Dmitry Vyukov wrote: > > > > > Call trace: > > > > > dump_backtrace+0x0/0x1ac arch/arm64/kernel/stacktrace.c:76 > > > > > show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:215 > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > dump_stack_lvl+0x68/0x84 lib/dump_stack.c:105 > > > > > print_address_description+0x7c/0x2b4 mm/kasan/report.c:256 > > > > > __kasan_report mm/kasan/report.c:442 [inline] > > > > > kasan_report+0x134/0x380 mm/kasan/report.c:459 > > > > > __do_kernel_fault+0x128/0x1bc arch/arm64/mm/fault.c:317 > > > > > do_bad_area arch/arm64/mm/fault.c:466 [inline] > > > > > do_tag_check_fault+0x74/0x90 arch/arm64/mm/fault.c:737 > > > > > do_mem_abort+0x44/0xb4 arch/arm64/mm/fault.c:813 > > > > > el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:357 > > > > > el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:408 > > > > > el1h_64_sync+0x78/0x7c arch/arm64/kernel/entry.S:567 > > > > > __entry_tramp_text_end+0xdfc/0x3000 > > > > > > > > /\/\/\/\/\/\/\ > > > > > > > > This is broken unwind on arm64. d_lookup statically calls __d_lookup, > > > > not __entry_tramp_text_end (which is not even a function). > > > > See the following thread for some debugging details: > > > > https://lore.kernel.org/lkml/CACT4Y+ZByJ71QfYHTByWaeCqZFxYfp8W8oyrK0baNaSJMDzoUw@xxxxxxxxxxxxxx/ > > Looking at this again (and as you point out below), my initial analysis > was wrong, and this isn't to do with the LR -- this value should be the > PC at the time the exception boundary. Whoops, I accidentally nuked the more complete/accurate analysis I just wrote and sent the earlier version. Today is not a good day for me and computers. :( What's happened here is that __d_lookup() (via a few layers of inlining) called load_unaligned_zeropad(). The `LDR` at the start of the asm faulted (I suspect due to a tag check fault), and so the exception handler replaced the PC with the (anonymous) fixup function. This is akin to a tail or sibling call, and so the fixup function entirely replaces __d_lookup() in the trace. The fixup function itself has an `LDR` which faulted (because it's designed to fixup page alignment problems, not tag check faults), and that is what's reported here. As the fixup function is anonymous, and the nearest prior symbol in .text is __entry_tramp_text_end, it gets symbolized as an offset from that. We can make the unwinds a bit nicer by adding some markers (e.g. patch below), but actually fixing this case will require some more thought. Thanks, Mark. ---->8---- diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 709d2c433c5e..127096a0faea 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -111,6 +111,11 @@ jiffies = jiffies_64; #define TRAMP_TEXT #endif +#define FIXUP_TEXT \ + __fixup_text_start = .; \ + *(.fixup); \ + __fixup_text_end = .; + /* * The size of the PE/COFF section that covers the kernel image, which * runs from _stext to _edata, must be a round multiple of the PE/COFF @@ -161,7 +166,7 @@ SECTIONS IDMAP_TEXT HIBERNATE_TEXT TRAMP_TEXT - *(.fixup) + FIXUP_TEXT *(.gnu.warning) . = ALIGN(16); *(.got) /* Global offset table */