On 12/5/2023 12:28, Mario Limonciello wrote:
On 12/5/2023 11:31, Bjorn Helgaas wrote:
On Tue, Dec 05, 2023 at 11:00:31AM -0600, Mario Limonciello wrote:
On 12/5/2023 10:17, Bjorn Helgaas wrote:
On Tue, Dec 05, 2023 at 09:48:45AM -0600, Mario Limonciello wrote:
commit 7752d5cfe3d1 ("x86: validate against acpi motherboard
resources")
introduced checks for ensuring that MCFG table also has memory region
reservations to ensure no conflicts were introduced from a buggy BIOS.
This has proceeded over time to add other types of reservation checks
for ACPI PNP resources and EFI MMIO memory type. The PCI firmware
spec
however says that these checks are only required when the operating
system
doesn't comprehend the firmware region:
```
If the operating system does not natively comprehend reserving the
MMCFG
region, the MMCFG region must be reserved by firmware. The address
range
reported in the MCFG table or by _CBA method (see Section 4.1.3)
must be
reserved by declaring a motherboard resource. For most systems, the
motherboard resource would appear at the root of the ACPI namespace
(under \_SB) in a node with a _HID of EISAID (PNP0C02), and the
resources
in this case should not be claimed in the root PCI bus’s _CRS. The
resources can optionally be returned in Int15 E820h or EFIGetMemoryMap
as reserved memory but must always be reported through ACPI as a
motherboard resource.
```
My understanding is that native comprehension would mean Linux knows
how to discover and/or configure the MMCFG base address and size in
the hardware and that Linux would then reserve that region so it's not
used for anything else.
Linux doesn't have that, at least for x86. It relies on the MCFG
table to discover the MMCFG region, and it relies on PNP0C02 _CRS to
reserve it.
MCFG to discover it matches the PCI firmware spec, but as I point
out above the decision to reserve this region doesn't require
PNP0C01/PNP0C02 _CRS.
Can you explain this reasoning a little more? I claim Linux does not
natively comprehend reserving the MMCFG region, but it sounds like you
don't agree? I think "native" comprehension would mean Linux would
not need the MCFG table.
After our thread and the spec again I think you're right Linux doesn't
natively comprehend (reserve this region;) particularly because of the
stance you have on "static table" vs _CRS.
This is a decision made by Linux historically.
Running this check causes problems with accessing extended PCI
configuration space on OEM laptops that don't specify the region in
PNP
resources or in the EFI memory map. That later manifests as
problems with
dGPU and accessing resizable BAR.
Is there a problem report we can reference here?
Nothing public to share. AMD BIOS team is in discussion with the OEM
to add
the reservation in a BIOS upgrade so it works with things like the LTS
kernels.
Is there some reason this can't be made public (it's obviously fine to
redact proprietary details)? It's really hard to make this code work
for all the cases even when we know all the details, and practically
impossible if we don't.
I just don't want to throw the vendor under the bus as it could have
been caught "sooner" and fixed by BIOS adding _CRS.
I'll share the full dmesg below just redacting the DMI information.
Knowing Windows works without it I feel this is still something that we
should be looking at fixing from an upstream perspective though which is
what prompted my patch and discussion.
We definitely need to change Linux so it works correctly with firmware
in the field, whether that means fixing a Linux defect or working
around a firmware defect.
Does the problem still occur with this series?
https://lore.kernel.org/r/20231121183643.249006-1-helgaas@xxxxxxxxxx
This appeared in linux-next 20231130.
Thanks for sharing that. If I do respin a variation of this patch I'll
rebase on top of that.
I had a try with that series on top of 6.7-rc4, but it doesn't fix
the issue
(but obviously the patch I sent does).
# journalctl -k | grep ECAM
Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem
0xe0000000-0xefffffff]
(base 0xe0000000) for domain 0000 [bus 00-ff]
Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
0xe0000000-0xefffffff] not reserved)
Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem
0xe0000000-0xefffffff]
(base 0xe0000000) for domain 0000 [bus 00-ff]
Dec 05 06:37:46 cl-fw-fedora kernel: PCI: [Firmware Info]: ECAM [mem
0xe0000000-0xefffffff] not reserved in ACPI motherboard resources
Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
0xe0000000-0xefffffff] not reserved)
Can you boot with 'efi=debug dyndbg="file arch/x86/pci +p"' and share
the complete dmesg log (redacted if necessary) somewhere? It's
important to know more about why and how this doesn't work. I added
more debug logging, but possibly it's still not enough.
Here you go (6.7-rc4 + that series you linked):
https://gist.github.com/superm1/eca87ae661793b9ab969829946adb084
Similar problems don't exist in Windows 11 with exact same
laptop/firmware stack, and in discussion with AMD's BIOS team
Windows doesn't have similar checks.
I would love to know AMD BIOS team's take on this. Does the BIOS
reserve the MMCFG space in any way?
On the AMD reference platform this OEM system is based on it is
reserved in
the EFI memory map. So on a 6.7 based kernel the reference system
you can
see this emitted:
PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved as
EfiMemoryMappedIO
The EfiMemoryMappedIO entry is not a *reservation* (this was a poor
choice of words in the logging, and my series changes it). This entry
only means the firmware requests that the OS map this region to a
virtual address so it can be used by EFI runtime services (UEFI v2.9,
sec 7.2).
In that sense the only reason this works on the AMD reference platform
is because that region happens to have been reserved from a subset of
another region.
Per the stance on "static table", we should advocate for _CRS to be
populated with MCFG on AMD reference platform too, right?
But on the OEM system this is not reserved by EFI memory map or _CRS.
That's why my assumption after reading the firmware spec and seeing the
behavior is that Windows makes the reservation *based on* what's in
MCFG.
Is there some spec language that says MCFG reserves space? I'm not
aware of anything about ACPI static tables reserving MMIO space.
Here's my reasoning around static tables vs _CRS for reservations:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.6#n32
Reading your stance it makes sense more of why we're where we are now.
Let me ask though - why does the distinction of old OS vs new OS matter?
If a vendor wants it to work with a kernel that didn't use MCFG to make
a reservation _CRS or some other overlapping reservation is their only
option.
But if we changed this behavior in a newer kernel then the stance can be
something like:
"upstream kernel 6.8 or newer will reserve MCFG if not specified by _CRS
or any other overlapping reservation"
and
"upstream kernel 6.7 or older require explicit reservations".
It seems to me that this type of issue would entirely go away in most
cases and it would satisfy the spec note about
'natively comprehend' reserving the MMCFG region.
I don't think this should be any surprise, but this patch on top of your
series fixes the issue on that system.
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 0cc9520666ef..6a77441565e2 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -571,8 +571,6 @@ static void __init pci_mmcfg_reject_broken(int early)
if (!pci_mmcfg_reserved(NULL, cfg, early)) {
pr_info("not using ECAM (%pR not reserved)\n",
&cfg->res);
- free_all_mmcfg();
- return;
}
}
}
And from what I can tell this *does* make a "reservation".
Specifically because pci_mmcfg_late_insert_resources() uses
insert_resource() to put it in place. I would expect if something else
tries to request that region later it would get a conflict.