Re: [PATCH V3] LoongArch: Add efistub booting support

Ard Biesheuvel <ardb@xxxxxxxxxx> · Mon, 5 Sep 2022 20:35:52 +0200

On Mon, 5 Sept 2022 at 20:08, WANG Xuerui <kernel@xxxxxxxxxx> wrote:
>
> On 9/5/22 15:28, Ard Biesheuvel wrote:
> > [snip]
> >>>>>> And I have some other questions about kexec: kexec should jump to the
> >>>>>> elf entry or the pe entry? I think is the elf entry, because if we
> >>>>>> jump to the pe entry, then SVAM will be executed twice (but it should
> >>>>>> be executed only once). However, how can we jump to the elf entry if
> >>>>>> we use zboot? Maybe it is kexec-tool's responsibility to decompress
> >>>>>> the zboot kernel image?
> >>>>>>
> >>>>> Yes, very good point. Kexec kernels cannot boot via the EFI entry
> >>>>> point, as the boot services will already be shutdown. So the kexec
> >>>>> kernel needs to boot via the same entrypoint in the core kernel that
> >>>>> the EFI stub calls when it hands over.
> >>>>>
> >>>>> For the EFI zboot image in particular, we will need to teach kexec how
> >>>>> to decompress them. The zboot image has a header that
> >>>>> a) describes it as a EFI linux zimg
> >>>>> b) describes the start and end offset of the compressed payload
> >>>>> c) describes which compression algorithm was used.
> >>>>>
> >>>>> This means that any non-EFI loader (including kexec) should be able to
> >>>>> extract the inner PE/COFF image and decompress it. For arm64 and
> >>>>> RISC-V, this is sufficient as the EFI and raw images are the same. For
> >>>>> LoongArch, I suppose it means we need a way to enter the core kernel
> >>>>> directly via the entrypoint that the EFI stub uses when handing over
> >>>>> (and pass the original DT argument so the kexec kernel has access to
> >>>>> the EFI and ACPI firmware tables)
> >>>> OK, then is this implementation [1] acceptable? I remember that you
> >>>> said the MS-DOS header shouldn't contain other information, so I guess
> >>>> this is unacceptable?
> >>>>
> >>> No, this looks reasonable to me. I objected to using magic numbers in
> >>> the 'pure PE' view of the image, as it does not make sense for a pure
> >>> PE loader such as GRUB to rely on such metadata.
> >>>
> >>> In this case (like on arm64), we are dealing with something else: we
> >>> need to identify the image to the kernel itself, and here, using the
> >>> unused space in the MS-DOS header is fine.
> >>>
> >>>> [1] https://lore.kernel.org/loongarch/c4dbb14a-5580-1e47-3d15-5d2079e88404@xxxxxxxxxxx/T/#mb8c1dc44f7fa2d3ef638877f0cd3f958f0be96ad
> >> OK, then there is no big problem here. And I found that arm64/riscv
> >> don't need the kernel entry point in the header. I don't know why, but
> >> I think it implies that a unified layout across architectures is
> >> unnecessary, and I prefer to put the kernel entry point before
> >> effective kernel size. :)
> >>
> > It is fine to put the entry point offset in the header. arm64 and
> > RISC-V don't need this because the first instructions are a pseudo-NOP
> > (an instruction that does nothing but its binary encoding looks like
> > 'MZ..') and a jump to the actual entry point.
>
> FYI the same trick also works for LoongArch: the code "MZ\x00\x00" i.e.
> 00005a4d is in fact "ext.w.h $t1, $t6", which is going to simply trash
> one temporary register without any other effect, so a similar jump to
> the actual entrypoint could follow.
>
> This instruction is available for both LA32 and LA64. The only subset
> without it is the LA32 Primary, which is meant for university courses
> and probably would never run UEFI, so the instruction is safe to use.
>
> P.S. If we'd go the extra mile just for ensuring the instruction works
> on every possible LoongArch core, due to the prefix construction of
> LoongArch encoding, we could just change the bytes toward the MSB (so we
> keep the "MZ" with ease) and still only trash $t1. For example
> "MZ\x10\x00" or 00105a4d is "add.w $t1, $t6, $fp", which is similarly
> harmless, but this time it works on even coursework cores!
>

I don't think this is necessary. On arm64, the boot protocol was
already defined when the EFI stub requirements became apparent, and
RISC-V just copied arm64 for some reason.

There is no reason for the 'bare metal' image to be executable from
offset 0x0: in fact, it is better to restrict executable permissions
to code regions, and treat the header as a data region, which is what
it is fundamentally.

In fact, if there is a need to duplicate this information (given that
the PE/COFF header also carries the same information), I would
recommend describing the code (R-X) and data (RW-) regions, as well as
the entry point, and potentially permit the image to be booted with
memory protections enabled.