Hi, One comment (logging bug in patch) below: On 12/8/22 20:03, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped > by the OS so they can be accessed by EFI runtime services, but should have > no other significance to the OS (UEFI r2.10, sec 7.2). However, most > bootloaders and EFI stubs convert EfiMemoryMappedIO regions to > E820_TYPE_RESERVED entries, which prevent Linux from allocating space from > them (see remove_e820_regions()). > > Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI > host bridge windows, which means Linux can't allocate BAR space for > hot-added devices. > > Remove large EfiMemoryMappedIO regions from the E820 map to avoid this > problem. > > Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some > platforms, these describe non-window space that's included in host bridge > _CRS. If we assign that space to PCI devices, they don't work. On the > Lenovo X1 Carbon, this leads to suspend/resume failures. > > The previous solution to the problem of allocating BARs in these regions > was to add pci_crs_quirks[] entries to disable E820 checking for these > machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping > via quirks")): > > Acer DMI_PRODUCT_NAME Spin SP513-54N > Clevo DMI_BOARD_NAME X170KM-G > Lenovo DMI_PRODUCT_VERSION *IIL* > > Florent reported the BAR allocation issue on the Clevo NL4XLU. We could > add another quirk for the NL4XLU, but I hope this generic change can solve > it for many machines without having to add quirks. > > This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad > Slim 3 (Matt) and solves the problem even when overriding the existing > quirks by booting with "pci=use_e820". > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565 Clevo NL4XLU > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Ideapad Slim 3 > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 X1 Carbon > Reported-by: Florent DELAHAYE <kernelorg@xxxxxxxxx> > Tested-by: Konrad J Hambrick <kjhambrick@xxxxxxxxx> > Tested-by: Matt Hansen <2lprbe78@xxxxxxxx> > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: Hans de Goede <hdegoede@xxxxxxxxxx> > --- > arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index ebc98a68c400..dee1852e95cd 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -303,6 +303,50 @@ static void __init efi_clean_memmap(void) > } > } > > +/* > + * Firmware can use EfiMemoryMappedIO to request that MMIO regions be > + * mapped by the OS so they can be accessed by EFI runtime services, but > + * should have no other significance to the OS (UEFI r2.10, sec 7.2). > + * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO > + * regions to E820_TYPE_RESERVED entries, which prevent Linux from > + * allocating space from them (see remove_e820_regions()). > + * > + * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and > + * PCI host bridge windows, which means Linux can't allocate BAR space for > + * hot-added devices. > + * > + * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this > + * problem. > + * > + * Retain small EfiMemoryMappedIO regions because on some platforms, these > + * describe non-window space that's included in host bridge _CRS. If we > + * assign that space to PCI devices, they don't work. > + */ > +static void __init efi_remove_e820_mmio(void) > +{ > + efi_memory_desc_t *md; > + u64 size, start, end; > + int i = 0; > + > + for_each_efi_memory_desc(md) { > + if (md->type == EFI_MEMORY_MAPPED_IO) { > + size = md->num_pages << EFI_PAGE_SHIFT; > + if (size >= 256*1024) { > + start = md->phys_addr; > + end = start + size - 1; > + pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n", > + i, start, end, size >> 20); > + e820__range_remove(start, size, > + E820_TYPE_RESERVED, 1); > + } else { > + pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n", > + i, start, end, size >> 10); The logging in this else is re-using the start and end from the previous section which was actually removed. E.g. Matt's latest log from: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 has: [ 0.000000] e820: remove [mem 0xfc800000-0xfe7fffff] reserved [ 0.000000] efi: Not removing mem46: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map [ 0.000000] efi: Not removing mem47: MMIO range=[0xfc800000-0xfe7fffff] (32KB) from e820 map [ 0.000000] efi: Not removing mem49: MMIO range=[0xfc800000-0xfe7fffff] (8KB) from e820 map [ 0.000000] efi: Not removing mem50: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map Notice how all the "Not removing ..." lines log the same range as the actually removed map entry above them. Regards, Hans > + } > + } > + i++; > + } > +} > + > void __init efi_print_memmap(void) > { > efi_memory_desc_t *md; > @@ -474,6 +518,8 @@ void __init efi_init(void) > set_bit(EFI_RUNTIME_SERVICES, &efi.flags); > efi_clean_memmap(); > > + efi_remove_e820_mmio(); > + > if (efi_enabled(EFI_DBG)) > efi_print_memmap(); > }