On Mon, Jun 03, 2024 at 11:56:01AM -0500, Kalra, Ashish wrote: > On 6/3/2024 10:29 AM, Mike Rapoport wrote: > > > On Mon, Jun 03, 2024 at 09:01:49AM -0500, Kalra, Ashish wrote: > > > On 6/3/2024 8:39 AM, Mike Rapoport wrote: > > > > > > > On Mon, Jun 03, 2024 at 08:06:56AM -0500, Kalra, Ashish wrote: > > > > > On 6/3/2024 3:56 AM, Borislav Petkov wrote > > > > > > > > > > > > EFI memory map and due to early allocation it uses memblock allocation. > > > > > > > > > > > > > > Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode() > > > > > > > in case of a kexec-ed kernel boot. > > > > > > > > > > > > > > This function kexec_enter_virtual_mode() installs the new EFI memory map by > > > > > > > calling efi_memmap_init_late() which remaps the efi_memmap physically allocated > > > > > > > in efi_arch_mem_reserve(), but this remapping is still using memblock allocation. > > > > > > > > > > > > > > Subsequently, when memblock is freed later in boot flow, this remapped > > > > > > > efi_memmap will have random corruption (similar to a use-after-free scenario). > > > > > > > > > > > > > > The corrupted EFI memory map is then passed to the next kexec-ed kernel > > > > > > > which causes a panic when trying to use the corrupted EFI memory map. > > > > > > This sounds fishy: memblock allocated memory is not freed later in the > > > > > > boot - it remains reserved. Only free memory is freed from memblock to > > > > > > the buddy allocator. > > > > > > > > > > > > Or is the problem that memblock-allocated memory cannot be memremapped > > > > > > because *raisins*? > > > > > This is what seems to be happening: > > > > > > > > > > efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for > > > > > EFI memory map and due to early allocation it uses memblock allocation. > > > > > > > > > > And later efi_enter_virtual_mode() calls kexec_enter_virtual_mode() > > > > > in case of a kexec-ed kernel boot. > > > > > > > > > > This function kexec_enter_virtual_mode() installs the new EFI memory map by > > > > > calling efi_memmap_init_late() which does memremap() on memblock-allocated memory. > > > > Does the issue happen only with SNP? > > > This is observed under SNP as efi_arch_mem_reserve() is only being called > > > with SNP enabled and then efi_arch_mem_reserve() allocates EFI memory map > > > using memblock. > > I don't see how efi_arch_mem_reserve() is only called with SNP. What did I > > miss? > > This is the call stack for efi_arch_mem_reserve(): > > [ 0.310010] efi_arch_mem_reserve+0xb1/0x220 > [ 0.311382] efi_mem_reserve+0x36/0x60 > [ 0.311973] efi_bgrt_init+0x17d/0x1a0 > [ 0.313265] acpi_parse_bgrt+0x12/0x20 > [ 0.313858] acpi_table_parse+0x77/0xd0 > [ 0.314463] acpi_boot_init+0x362/0x630 > [ 0.315069] setup_arch+0xa88/0xf80 > [ 0.315629] start_kernel+0x68/0xa90 > [ 0.316194] x86_64_start_reservations+0x1c/0x30 > [ 0.316921] x86_64_start_kernel+0xbf/0x110 > [ 0.317582] common_startup_64+0x13e/0x141 > > So, probably it is being invoked specifically for AMD platform ? AFAIU, efi_bgrt_init() can be called for any x86 platform, with or without encryption. So if my understating is correct, efi_arch_mem_reserve() will be called with SNP disabled as well. And if kexec works ok without SNP but fails with SNP this may give as a clue to the root cause of the failure. > > > If we skip efi_arch_mem_reserve() (which should probably be anyway skipped > > > for kexec case), then for kexec boot, EFI memmap is memremapped in the same > > > virtual address as the first kernel and not the allocated memblock address. > > Maybe we should skip efi_arch_mem_reserve() for kexec case, but I think we > > still need to understand what's causing memory corruption. > > When, efi_arch_mem_reserve() allocates memory for EFI memory map using > memblock and then later in boot, kexec_enter_virtual_mode() does memremap on > this memblock allocated memory, subsequently after this i see EFI memory map > corruption, so are there are any issues doing memremap on memblock-allocated > memory ? memblock-allocated memory is just RAM, so my take is that memremap() cannot figure out the encryption bits properly. You can check if there are issues with memrmapp()ing memblock-allocated memory by sticking memblock_phys_alloc() somewhere, filling that memory with a pattern and then calling memremap(addr, size, MEMREMAP_WB) and checking if the pattern is still there. > Thanks, Ashish > > > > > I didn't really dig, but my theory would be that it has something to do > > > > with arch_memremap_can_ram_remap() in arch/x86/mm/ioremap.c > > > > > Thanks, Ashish -- Sincerely yours, Mike. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec