On 09/01/2025 14:48, Ard Biesheuvel wrote: > On Tue, 7 Jan 2025 at 13:05, Breno Leitao <leitao@xxxxxxxxxx> wrote: >> >> Hello Ard, >> >> On Tue, Jan 07, 2025 at 12:24:03PM +0100, Ard Biesheuvel wrote: >>> On Mon, 6 Jan 2025 at 20:03, Breno Leitao <leitao@xxxxxxxxxx> wrote: >>>> >>>> Add EFI memory map size to warning messages when a corrupted Memory >>>> Attributes Table is detected, making it easier to diagnose firmware issues. >>>> >>>> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx> >>>> --- >>>> drivers/firmware/efi/memattr.c | 9 +++++---- >>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c >>>> index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644 >>>> --- a/drivers/firmware/efi/memattr.c >>>> +++ b/drivers/firmware/efi/memattr.c >>>> @@ -22,7 +22,7 @@ unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR; >>>> void __init efi_memattr_init(void) >>>> { >>>> efi_memory_attributes_table_t *tbl; >>>> - unsigned long size; >>>> + unsigned long size, efi_map_size; >>>> >>>> if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR) >>>> return; >>>> @@ -49,9 +49,10 @@ void __init efi_memattr_init(void) >>>> * just be ignored altogether. >>>> */ >>>> size = tbl->num_entries * tbl->desc_size; >>>> - if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) { >>>> - pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n", >>>> - tbl->version, tbl->desc_size, tbl->num_entries); >>>> + efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size; >>>> + if (size > 3 * efi_map_size) { >>>> + pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n", >>>> + tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size); >>>> goto unmap; >>>> } >>>> >>>> >>> >>> Hello Breno, >>> >>> I don't mind the patch per se, but I don't think it is terribly useful either. >>> >>> Could you explain how this helps? >> >> We are seeing a bunch of `Corrupted EFI Memory Attributes Table >> detected!` in the Meta fleet, and this is something we are >> investigating. >> >> We highly think this is related to some kexec overwrites, and when we >> get here, the EFI table is completely garbage. I haven't seen this >> problem on cold boot. >> > > It likely means the memory is not reserved correctly. > > Could you check whether this > > --- a/drivers/firmware/efi/memattr.c > +++ b/drivers/firmware/efi/memattr.c > @@ -56,7 +56,7 @@ int __init efi_memattr_init(void) > } > > tbl_size = sizeof(*tbl) + size; > - memblock_reserve(efi_mem_attr_table, tbl_size); > + efi_mem_reserve(efi_mem_attr_table, tbl_size); > set_bit(EFI_MEM_ATTR, &efi.flags); > > unmap: > > > makes any difference? > Hi Ard, Thanks for the reply! I have further explained the problems and possible solutions in https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@xxxxxxxxx/. I am assuming the above diff is to solve problem 2 that I have described in the patches. I haven't tested it, because its a bit difficult to reproduce problem 2, but I think the above diff might not make a difference? efi_mem_reserve changes e820_table, while /sys/firmware/memmap uses e820_table_firmware. An alternate solution might be to change /sys/firmware/memmap to e820_table. I didnt go down that route because, you would be changing what the kernel exposes to userspace, which might not be the right thing. Thanks, Usama > >> Here are sof the instances I see: >> >> efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 18058, num_entries == 33554432) >> efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968) >> efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 83886080, num_entries == 304) >> efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 40) >> > > The last one looks like a false positive: each of those values seems > perfectly reasonable. > > Any chance you could dump the memory map and this table (boot using > efi=debug) on this system? > > >> Anyway, back to you question, this patch helped us to narrow down and >> find where the problem was, by printing all variables taken in >> consideration to get the conclusion that the firmware is buggy. >> > > Fair enough. > >> Regarding the problem, Usama and I are suspecting that it might be >> related to some 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for >> event log to avoid corruption"), but at this time with memattr table, where it >> might not preserved during kexec(?). >> > > Please see the suggestion above.