On 9/5/22 15:28, Ard Biesheuvel wrote:
[snip]
And I have some other questions about kexec: kexec should jump to the
elf entry or the pe entry? I think is the elf entry, because if we
jump to the pe entry, then SVAM will be executed twice (but it should
be executed only once). However, how can we jump to the elf entry if
we use zboot? Maybe it is kexec-tool's responsibility to decompress
the zboot kernel image?
Yes, very good point. Kexec kernels cannot boot via the EFI entry
point, as the boot services will already be shutdown. So the kexec
kernel needs to boot via the same entrypoint in the core kernel that
the EFI stub calls when it hands over.
For the EFI zboot image in particular, we will need to teach kexec how
to decompress them. The zboot image has a header that
a) describes it as a EFI linux zimg
b) describes the start and end offset of the compressed payload
c) describes which compression algorithm was used.
This means that any non-EFI loader (including kexec) should be able to
extract the inner PE/COFF image and decompress it. For arm64 and
RISC-V, this is sufficient as the EFI and raw images are the same. For
LoongArch, I suppose it means we need a way to enter the core kernel
directly via the entrypoint that the EFI stub uses when handing over
(and pass the original DT argument so the kexec kernel has access to
the EFI and ACPI firmware tables)
OK, then is this implementation [1] acceptable? I remember that you
said the MS-DOS header shouldn't contain other information, so I guess
this is unacceptable?
No, this looks reasonable to me. I objected to using magic numbers in
the 'pure PE' view of the image, as it does not make sense for a pure
PE loader such as GRUB to rely on such metadata.
In this case (like on arm64), we are dealing with something else: we
need to identify the image to the kernel itself, and here, using the
unused space in the MS-DOS header is fine.
[1] https://lore.kernel.org/loongarch/c4dbb14a-5580-1e47-3d15-5d2079e88404@xxxxxxxxxxx/T/#mb8c1dc44f7fa2d3ef638877f0cd3f958f0be96ad
OK, then there is no big problem here. And I found that arm64/riscv
don't need the kernel entry point in the header. I don't know why, but
I think it implies that a unified layout across architectures is
unnecessary, and I prefer to put the kernel entry point before
effective kernel size. :)
It is fine to put the entry point offset in the header. arm64 and
RISC-V don't need this because the first instructions are a pseudo-NOP
(an instruction that does nothing but its binary encoding looks like
'MZ..') and a jump to the actual entry point.
FYI the same trick also works for LoongArch: the code "MZ\x00\x00" i.e.
00005a4d is in fact "ext.w.h $t1, $t6", which is going to simply trash
one temporary register without any other effect, so a similar jump to
the actual entrypoint could follow.
This instruction is available for both LA32 and LA64. The only subset
without it is the LA32 Primary, which is meant for university courses
and probably would never run UEFI, so the instruction is safe to use.
P.S. If we'd go the extra mile just for ensuring the instruction works
on every possible LoongArch core, due to the prefix construction of
LoongArch encoding, we could just change the bytes toward the MSB (so we
keep the "MZ" with ease) and still only trash $t1. For example
"MZ\x10\x00" or 00105a4d is "add.w $t1, $t6, $fp", which is similarly
harmless, but this time it works on even coursework cores!
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/