On Mon, 27 Sept 2021 at 19:18, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > On Mon, Sep 27, 2021 at 06:01:22PM +0100, Mark Rutland wrote: > > On Mon, Sep 27, 2021 at 04:27:30PM +0200, Dmitry Vyukov wrote: > > > On Tue, 21 Sept 2021 at 18:51, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > > > > > Hi Dmitry, > > > > > > > > The good news is that the bad unwind is a known issue, the bad news is > > > > that we don't currently have a way to fix it (and I'm planning to talk > > > > about this at the LPC "objtool on arm64" talk this Friday). > > > > > > > > More info below: the gist is we can produce spurious entries at an > > > > exception boundary, but shouldn't miss a legitimate value, and there's a > > > > plan to make it easier to spot when entries are not legitimate. > > > > > > > > On Fri, Sep 17, 2021 at 05:03:48PM +0200, Dmitry Vyukov wrote: > > > > > > Call trace: > > > > > > dump_backtrace+0x0/0x1ac arch/arm64/kernel/stacktrace.c:76 > > > > > > show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:215 > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > dump_stack_lvl+0x68/0x84 lib/dump_stack.c:105 > > > > > > print_address_description+0x7c/0x2b4 mm/kasan/report.c:256 > > > > > > __kasan_report mm/kasan/report.c:442 [inline] > > > > > > kasan_report+0x134/0x380 mm/kasan/report.c:459 > > > > > > __do_kernel_fault+0x128/0x1bc arch/arm64/mm/fault.c:317 > > > > > > do_bad_area arch/arm64/mm/fault.c:466 [inline] > > > > > > do_tag_check_fault+0x74/0x90 arch/arm64/mm/fault.c:737 > > > > > > do_mem_abort+0x44/0xb4 arch/arm64/mm/fault.c:813 > > > > > > el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:357 > > > > > > el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:408 > > > > > > el1h_64_sync+0x78/0x7c arch/arm64/kernel/entry.S:567 > > > > > > __entry_tramp_text_end+0xdfc/0x3000 > > > > > > > > > > /\/\/\/\/\/\/\ > > > > > > > > > > This is broken unwind on arm64. d_lookup statically calls __d_lookup, > > > > > not __entry_tramp_text_end (which is not even a function). > > > > > See the following thread for some debugging details: > > > > > https://lore.kernel.org/lkml/CACT4Y+ZByJ71QfYHTByWaeCqZFxYfp8W8oyrK0baNaSJMDzoUw@xxxxxxxxxxxxxx/ > > > > Looking at this again (and as you point out below), my initial analysis > > was wrong, and this isn't to do with the LR -- this value should be the > > PC at the time the exception boundary. > > Whoops, I accidentally nuked the more complete/accurate analysis I just > wrote and sent the earlier version. Today is not a good day for me and > computers. :( > > What's happened here is that __d_lookup() (via a few layers of inlining) called > load_unaligned_zeropad(). The `LDR` at the start of the asm faulted (I suspect > due to a tag check fault), and so the exception handler replaced the PC with > the (anonymous) fixup function. This is akin to a tail or sibling call, and so > the fixup function entirely replaces __d_lookup() in the trace. > > The fixup function itself has an `LDR` which faulted (because it's > designed to fixup page alignment problems, not tag check faults), and > that is what's reported here. > > As the fixup function is anonymous, and the nearest prior symbol in .text is > __entry_tramp_text_end, it gets symbolized as an offset from that. > > We can make the unwinds a bit nicer by adding some markers (e.g. patch > below), but actually fixing this case will require some more thought. > > Thanks, > Mark. > > ---->8---- > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > index 709d2c433c5e..127096a0faea 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -111,6 +111,11 @@ jiffies = jiffies_64; > #define TRAMP_TEXT > #endif > > +#define FIXUP_TEXT \ > + __fixup_text_start = .; \ > + *(.fixup); \ > + __fixup_text_end = .; > + > /* > * The size of the PE/COFF section that covers the kernel image, which > * runs from _stext to _edata, must be a round multiple of the PE/COFF > @@ -161,7 +166,7 @@ SECTIONS > IDMAP_TEXT > HIBERNATE_TEXT > TRAMP_TEXT > - *(.fixup) > + FIXUP_TEXT > *(.gnu.warning) > . = ALIGN(16); > *(.got) /* Global offset table */ Oh, good it's very local to the .fixup thing rather than a common issue that affects all unwinds. In the other x86 thread Josh Poimboeuf suggested to use asm goto to a cold part of the function instead of .fixup: https://lore.kernel.org/lkml/20210927234543.6waods7rraxseind@treble/ This sounds like a more reliable solution that will cause less maintenance burden. Would it work for arm64 as well?