On 10/09/2024 15:26, Eric W. Biederman wrote: > Breno Leitao <leitao@xxxxxxxxxx> writes: > >> We've seen a problem in upstream kernel kexec, where a EFI TPM log event table >> is being overwritten. This problem happen on real machine, as well as in a >> recent EDK2 qemu VM. >> >> Digging deep, the table is being overwritten during kexec, more precisely when >> relocating kernel (relocate_kernel() function). >> >> I've also found that the table is being properly reserved using >> memblock_reserve() early in the boot, and that range gets overwritten later in >> by relocate_kernel(). In other words, kexec is overwriting a memory that was >> previously reserved (as memblock_reserve()). >> >> Usama found that kexec only honours memory reservations from /sys/firmware/memmap >> which comes from e820_table_firmware table. >> >> Looking at the TPM spec, I found the following part: >> >> If the ACPI TPM2 table contains the address and size of the Platform Firmware TCG log, >> firmware “pins” the memory associated with the Platform Firmware TCG log, and reports >> this memory as “Reserved” memory via the INT 15h/E820 interface. >> >> >> From: https://trustedcomputinggroup.org/wp-content/uploads/PC-ClientPlatform_Profile_for_TPM_2p0_Systems_v49_161114_public-review.pdf >> >> I am wondering if that memory region/range should be part of e820 table that is >> passed by EFI firmware to kernel, and if it is not passed (as it is not being >> passed today), then the kernel doesn't need to respect it, and it is free to >> overwrite (as it does today). In other words, this is a firmware bug and not a >> kernel bug. >> >> Am I missing something? > > I agree that this appears to be a firmware bug. This memory is reserved > in one location and not in another location. > > That said that doesn't mean we can't deal with it in the kernel. > acpi_table_upgrade seems to have hit a similar issue issue and calls > arch_reserve_mem_area to reserve the area in the e820tables. > > > The last time I looked the e820 tables (in the kernel) are used to store > the efi memory map when available and only use the true e820 data on > older systems. > > Which is a long way of say that the e820 table in the kernel last I > looked was the master table, of how the firmware views the memory. > > > As I recall the memblock allocator is the bootstrap memory allocator > used when bringing up the kernel. So I don't see reserving something > in the memblock allocator as being authoritative as to how the firmware > has setup memory. > > m> > I would suggest writing a patch to update whatever is calling > memblock_reserve to also, or perhaps in preference to update the e820 > map. If the code is not x86 specific I would suggest using ACPI's > arch_reserve_mem_area call. > So I believe arch_reserve_mem_area is unfortunately not enough. It updates e820_table, but kexec seems to use e820_table_firmware. I believe the proper fix should be in efi firmware, which might be a bit difficult to get through. But with the below secondary fix in kernel, the corruption is gone, it would be good to have efi, tpm and kexec experts to look at this and tell if it makes sense? Thanks! Usama