On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote: > > On October 22, 2023 7:31:21 PM PDT, John Sperbeck <jsperbeck@xxxxxxxxxx> wrote: > >The physical memory range that kexec selects for the compressed > >bzimage target kernel, might not be where it runs from. The > >startup_64() code in head_64.S copies itself out of the way > >before the decompression so it doesn't clobber itself. > > > >If the start of the memory range selected by kexec is above > >LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains > >within the memory area. But if the start is below this range, > >then the copy will likely end up outside the range. > > > >Usually, this will be harmless because not much memory is in use > >at the time of the pre-decompression copy, so there is little > >to accidentally clobber. However, an unlucky choice for the > >adress of the kernel and the initrd could put the initrd in harm's > >way. For example: > > > > 0x00400000 - physical address for target kernel > > 0x03ff8000 - physical address of seven-page initrd > > 0x0302c000 - size of uncompressed kernel (about 50 Mbytes) > > > >The decompressed kernel will span 0x01000000 through 0x0402c000, > >which will overwrite the initrd. > > > >If the kexec code restricts itself to physical addresses above > >0x01000000, then the pre-decompression copy and the decompression > >itself will stay within the bounds of the memory kexec selected > >(unless a non-default value is used in the target kernel for > >CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR, > >but that's probably unsolvable unless the target kernel were to > >somehow communicate this to kexec). > > > >Signed-off-by: John Sperbeck <jsperbeck@xxxxxxxxxx> > >--- > > arch/x86/kernel/kexec-bzimage64.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c > >index a61c12c01270..d6bf6c13dab1 100644 > >--- a/arch/x86/kernel/kexec-bzimage64.c > >+++ b/arch/x86/kernel/kexec-bzimage64.c > >@@ -36,7 +36,7 @@ > > */ > > #define MIN_PURGATORY_ADDR 0x3000 > > #define MIN_BOOTPARAM_ADDR 0x3000 > >-#define MIN_KERNEL_LOAD_ADDR 0x100000 > >+#define MIN_KERNEL_LOAD_ADDR 0x1000000 > > #define MIN_INITRD_LOAD_ADDR 0x1000000 > > > > /* > > This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly. The startup_64() code in head_64.S changes behavior based on whether it's running below or above LOAD_PHYSICAL_ADDR: #ifdef CONFIG_RELOCATABLE leaq startup_32(%rip) /* - $startup_32 */, %rbp movl BP_kernel_alignment(%rsi), %eax decl %eax addq %rax, %rbp notq %rax andq %rax, %rbp cmpq $LOAD_PHYSICAL_ADDR, %rbp jae 1f #endif movq $LOAD_PHYSICAL_ADDR, %rbp 1: In my example, we were running from address 0x00400000. The %rbp register will start with 0x00400000, but will be changed to 0x01000000 after the check against LOAD_PHYSICAL_ADDR fails. The 0x01000000 value in %rbp is passed to extract_kernel as the 'output' argument. Unless choose_random_location() decides differently, this will be where the kernel is decompressed to. The size of the kernel is large enough in my example that the decompression overruns the initrd. If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and used %rpb as is, then there would be no issue. The decompression would have been to 0x00400000 and would have completed before reaching the initrd memory. That is, the kexec code is being careful to ensure that the kernel and initrd memory doesn't overlap, but isn't paying attention to what happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel address is effectively changed to a different location). My proposed change makes it aware, and avoids such addresses. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec