On Sat, Nov 18, 2023 at 03:21:43PM +0100, Tomasz Pala wrote: > On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote: > > >> https://bugzilla.kernel.org/show_bug.cgi?id=218050 > >> > >> I think the problem is that the MMCONFIG region is at > >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the > >> host bridge windows reported via _CRS: > >> > >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window] > >> > >> I'll try to figure out how to deal with that. In the meantime, would > >> you mind attaching the contents of /proc/iomem to the bugzilla? I > > > > I attached a debug patch to both bugzilla entries. If you could > > attach the "acpidump" output and (if practical) boot a kernel with the > > debug patch and attach the dmesg logs, that would be great. > > I've posted the files. There are signs of buggy BIOS, but I don't expect > any firmware update to be released for this hw anymore. Thank you! A BIOS update is almost never the answer because even if an update exists, we have to assume that most users in the field will never install the update. I want to look at the BIOS info in case we can learn about something *Linux* is doing wrong. This most likely works fine with Windows, so I assume Linux is doing something wrong or at least differently than Windows. > DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019 > > .text .data .bss are not marked as E820_TYPE_RAM! Added by 4eea6aa581ab ("x86, mm: if kernel .text .data .bss are not marked as E820_RAM, complain and fix"). No idea. A shame we didn't include the .text/.data values in the message. > tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED Added by 316253406959 ("x86, intel_txt: Intel TXT boot support"). No idea about this either. > DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes > DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff] Both related to arch_rmrr_sanity_check(), added by f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") and f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check"). No idea about this one either. The VT-d spec (r1.3, sec 8.4) says "BIOS must report the RMRR reported memory addresses as reserved in the system memory map returned through methods such as INT15, EFI GetMemoryMap etc." arch_rmrr_sanity_check() only looks at your e820 map, which only has this: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x0000000000100000-0x00000000d1f36fff] usable I think Linux basically converts the info from EFI GetMemoryMap to an e820 format; I think booting with "efi=debug" would show more details of this. Anyway, this is all a tangent. > BTW is there a reason for this logging discrepancy? > > efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map > efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map > > efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map > efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map > > This is arch/x86/platform/efi/efi.c: > static void __init efi_remove_e820_mmio(void) > > Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20 > Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10 You mean the MB vs KB difference? That's my fault. I guess I used KB for the "Not removing" message because those are smaller (< 256KB) so the size in MB wouldn't be useful there. We could use KB for both, but I guess I used MB for the "Remove" case because it's a little easier to read and I expected "Not removing" to be a relatively unusual case. Bjorn