On Mon, Aug 3, 2020 at 2:54 PM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: > > On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote: > > > From: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > > > > > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream. > > > > > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few > > > hundred identical messages: > > > > > > unwind: Unknown symbol address c0800300 > > > unwind: Index not found c0800300 > > > > > > c0800300 is the return address from the last subroutine call (to > > > __memzero()) in __mmap_switched(). Apparently having this address in > > > the link register confuses the unwinder. > > > > > > To fix this, reset the link register to zero before jumping to > > > start_kernel(). > > > > > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code") > > > Suggested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx> > > > Acked-by: Nicolas Pitre <nico@xxxxxxxxxx> > > > Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxx> > > > Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > > > --- > > > Looks like this first landed in v4.15-rc1. Without this, we can't tell > > > during an unwind initiated from start_kernel() when to stop unwinding, > > > which for the clang specific implementation of the arm frame pointer > > > unwinder leads to dereferencing a garbage value, triggering an exception > > > which has no fixup, triggering a panic, triggering an unwind, triggering > > > an infinite loop that prevents booting. I have more patches to send > > > upstream to make the unwinder more resilient, but it's ambiguous as to > > > when to stop unwinding without this patch. > > > > Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you > > _SURE_ this is needed in 4.14.y? > > It's a fair question, sorry I didn't notice and pre-emptively address. > > Yes. Prior to 59b6359dd92d, the value in the link register (LR) was > garbage left over from the calls to __memzero added in 9520b1a1b5f7. > I suspect that after a `ret` instruction, the value in LR really Sorry, rereading that `ret` is not an instruction on ARM; I may have confused the assembler macro defined in arch/arm/include/asm/assembler.h used in __turn_mmu_on to call into __mmap_switched with being an actual instruction. Maybe `I suspect that after returning from a called frame, the value in LR really shouldn't be used again.` would have been more precise. > shouldn't be used again. > > Having garbage in LR when chasing frame pointers from an unwind > started in start_kernel() makes it appear that there are further > frames to unwind, hence the error noted in the commit message of > 59b6359dd92d. > > Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the > Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references > the latest change that noticeably changed the value of LR, but it was > still previously undefined what its last value was set to). In fact, > digging up the original suggestion from Ard regarding 59b6359dd92d: > https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@xxxxxxxxxxxxxx/ > > > I don't think the patch itself is to blame here, it simply triggers an > > existing issue in the unwinder (if my analysis is correct, of course) > > and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d. > (I agree with Ard's analysis). Yes, "c0800300 is the return address > from the last subroutine call (to __memzero()) in __mmap_switched()" > is correct, but I'd have argued this was broken even before > 59b6359dd92d (which is Ard's point). Forgive me if I'm > misinterpreting your analysis, Ard. Maybe that Fixes tag was the > simplest to avoid backports to stable which would have had conflicts > due to 9520b1a1b5f7 being a dependency? > > Looking at the callers of __mmap_switched, it's hard to tell who would > have set the last value of LR, as there's divergent implementations > based on whether or not there's MMU support. I don't think it matters > though, and that unwinding via frame pointer on ARM out of a path in > start_kernel() was broken until 59b6359dd92d. I suspect a combination > of the frame pointer unwinder not being as popular on ARM (vs > CONFIG_UNWINDER_ARM) and the lack of needing to unwind from > start_kernel() (since unwinding occurs for exceptional cases like > WARN_ON or panics) as sources that may have helped to keep this bug > latent for a while. > > Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of > interest there IMO. > https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@xxxxxxxxx/ > -- > Thanks, > ~Nick Desaulniers -- Thanks, ~Nick Desaulniers