Re: [PATCH 4.14.y] ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:
> > From: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> >
> > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.
> >
> > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few
> > hundred identical messages:
> >
> >     unwind: Unknown symbol address c0800300
> >     unwind: Index not found c0800300
> >
> > c0800300 is the return address from the last subroutine call (to
> > __memzero()) in __mmap_switched().  Apparently having this address in
> > the link register confuses the unwinder.
> >
> > To fix this, reset the link register to zero before jumping to
> > start_kernel().
> >
> > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")
> > Suggested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> > Acked-by: Nicolas Pitre <nico@xxxxxxxxxx>
> > Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
> > ---
> > Looks like this first landed in v4.15-rc1.  Without this, we can't tell
> > during an unwind initiated from start_kernel() when to stop unwinding,
> > which for the clang specific implementation of the arm frame pointer
> > unwinder leads to dereferencing a garbage value, triggering an exception
> > which has no fixup, triggering a panic, triggering an unwind, triggering
> > an infinite loop that prevents booting. I have more patches to send
> > upstream to make the unwinder more resilient, but it's ambiguous as to
> > when to stop unwinding without this patch.
>
> Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you
> _SURE_ this is needed in 4.14.y?

It's a fair question, sorry I didn't notice and pre-emptively address.

Yes.  Prior to 59b6359dd92d, the value in the link register (LR) was
garbage left over from the calls to __memzero added in 9520b1a1b5f7.
I suspect that after a `ret` instruction, the value in LR really
shouldn't be used again.

Having garbage in LR when chasing frame pointers from an unwind
started in start_kernel() makes it appear that there are further
frames to unwind, hence the error noted in the commit message of
59b6359dd92d.

Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the
Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references
the latest change that noticeably changed the value of LR, but it was
still previously undefined what its last value was set to).  In fact,
digging up the original suggestion from Ard regarding 59b6359dd92d:
https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@xxxxxxxxxxxxxx/

> I don't think the patch itself is to blame here, it simply triggers an
> existing issue in the unwinder (if my analysis is correct, of course)

and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d.
(I agree with Ard's analysis).  Yes, "c0800300 is the return address
from the last subroutine call (to __memzero()) in __mmap_switched()"
is correct, but I'd have argued this was broken even before
59b6359dd92d (which is Ard's point).  Forgive me if I'm
misinterpreting your analysis, Ard.  Maybe that Fixes tag was the
simplest to avoid backports to stable which would have had conflicts
due to 9520b1a1b5f7 being a dependency?

Looking at the callers of __mmap_switched, it's hard to tell who would
have set the last value of LR, as there's divergent implementations
based on whether or not there's MMU support.  I don't think it matters
though, and that unwinding via frame pointer on ARM out of a path in
start_kernel() was broken until 59b6359dd92d.  I suspect a combination
of the frame pointer unwinder not being as popular on ARM (vs
CONFIG_UNWINDER_ARM) and the lack of needing to unwind from
start_kernel() (since unwinding occurs for exceptional cases like
WARN_ON or panics) as sources that may have helped to keep this bug
latent for a while.

Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of
interest there IMO.
https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@xxxxxxxxx/
-- 
Thanks,
~Nick Desaulniers



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux