Re: [PATCH 4.14.y] ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 3, 2020 at 2:54 PM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
>
> On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:
> > > From: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> > >
> > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.
> > >
> > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few
> > > hundred identical messages:
> > >
> > >     unwind: Unknown symbol address c0800300
> > >     unwind: Index not found c0800300
> > >
> > > c0800300 is the return address from the last subroutine call (to
> > > __memzero()) in __mmap_switched().  Apparently having this address in
> > > the link register confuses the unwinder.
> > >
> > > To fix this, reset the link register to zero before jumping to
> > > start_kernel().
> > >
> > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")
> > > Suggested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> > > Acked-by: Nicolas Pitre <nico@xxxxxxxxxx>
> > > Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
> > > ---
> > > Looks like this first landed in v4.15-rc1.  Without this, we can't tell
> > > during an unwind initiated from start_kernel() when to stop unwinding,
> > > which for the clang specific implementation of the arm frame pointer
> > > unwinder leads to dereferencing a garbage value, triggering an exception
> > > which has no fixup, triggering a panic, triggering an unwind, triggering
> > > an infinite loop that prevents booting. I have more patches to send
> > > upstream to make the unwinder more resilient, but it's ambiguous as to
> > > when to stop unwinding without this patch.
> >
> > Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you
> > _SURE_ this is needed in 4.14.y?
>
> It's a fair question, sorry I didn't notice and pre-emptively address.
>
> Yes.  Prior to 59b6359dd92d, the value in the link register (LR) was
> garbage left over from the calls to __memzero added in 9520b1a1b5f7.
> I suspect that after a `ret` instruction, the value in LR really

Sorry, rereading that `ret` is not an instruction on ARM; I may have
confused the assembler macro defined in
arch/arm/include/asm/assembler.h used in __turn_mmu_on to call into
__mmap_switched with being an actual instruction.  Maybe `I suspect
that after returning from a called frame, the value in LR really
shouldn't be used again.` would have been more precise.

> shouldn't be used again.
>
> Having garbage in LR when chasing frame pointers from an unwind
> started in start_kernel() makes it appear that there are further
> frames to unwind, hence the error noted in the commit message of
> 59b6359dd92d.
>
> Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the
> Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references
> the latest change that noticeably changed the value of LR, but it was
> still previously undefined what its last value was set to).  In fact,
> digging up the original suggestion from Ard regarding 59b6359dd92d:
> https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@xxxxxxxxxxxxxx/
>
> > I don't think the patch itself is to blame here, it simply triggers an
> > existing issue in the unwinder (if my analysis is correct, of course)
>
> and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d.
> (I agree with Ard's analysis).  Yes, "c0800300 is the return address
> from the last subroutine call (to __memzero()) in __mmap_switched()"
> is correct, but I'd have argued this was broken even before
> 59b6359dd92d (which is Ard's point).  Forgive me if I'm
> misinterpreting your analysis, Ard.  Maybe that Fixes tag was the
> simplest to avoid backports to stable which would have had conflicts
> due to 9520b1a1b5f7 being a dependency?
>
> Looking at the callers of __mmap_switched, it's hard to tell who would
> have set the last value of LR, as there's divergent implementations
> based on whether or not there's MMU support.  I don't think it matters
> though, and that unwinding via frame pointer on ARM out of a path in
> start_kernel() was broken until 59b6359dd92d.  I suspect a combination
> of the frame pointer unwinder not being as popular on ARM (vs
> CONFIG_UNWINDER_ARM) and the lack of needing to unwind from
> start_kernel() (since unwinding occurs for exceptional cases like
> WARN_ON or panics) as sources that may have helped to keep this bug
> latent for a while.
>
> Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of
> interest there IMO.
> https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@xxxxxxxxx/
> --
> Thanks,
> ~Nick Desaulniers



-- 
Thanks,
~Nick Desaulniers



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux