Bjorn Helgaas wrote: > On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote: > > Bjorn Helgaas wrote: > > > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote: > > > > Bjorn Helgaas wrote: > > > > > > > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the > > > > > firmware/kernel interface is as an EfiMemoryMappedIO region. > > > > > > > > > > I think this is a firmware bug, but obviously we're going to have to > > > > > figure out a way around it. > > > > > > > > Definitely an ambiguity / conflict, but not sure it is a bug when you > > > > look at from the perspective of how would an EFI runtime service use > > > > ECAM/MMCONFIG space? > > > > > > I think it's perfectly fine for firmware to advertise ECAM space as an > > > EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly > > > makes sense that EFI runtime services would use config space. > > > > > > My understanding is that the OS should learn about device address > > > space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware > > > Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a > > > PNP0C02 motherboard device _CRS. > > > > > > So what I think *is* a bug is that this firmware doesn't report the > > > ECAM space via PNP0C02 _CRS. > > > > > > If somebody thinks the lack of this reservation is not a bug, I would > > > love to hear ideas about how Linux *should* be handling this. There > > > are many variations on how firmware does things like this, and it's > > > been a nightmare trying to figure out something that works with all of > > > them. > > > > I am trying to get a statement from a BIOS person, but in the meantime I > > am confused by this lead in sentence of Note 2 in "PCI Firmware Spec > > v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space > > Access": > > > > If the operating system does not natively comprehend reserving the MMCFG > > region, the MMCFG region must be reserved by firmware. The address range > > reported in the MCFG table or by _CBA method (see Section 4.1.3) must be > > reserved by declaring a motherboard resource... > > > > Which seems to say it is ok for the OS to treat MMCFG space as reserved > > by default. It certainly fails the Robustness Principle for the BIOS to > > *assume* that the OS can natively comprehend that reservation, but it > > seems Linux is in its rights to make that assumption. > > I read "OS natively comprehends MMCFG space" as meaning "the OS has > device-specific knowledge of the PCI host bridge and the associated > MMCFG space." But in that case, the OS wouldn't need MCFG at all, so > maybe I'm not reading it right. > > There must have been some reason for that sentence, e.g., some system > that didn't or couldn't report MMCFG via PNP0C02 _CBA, but it sure > makes a mess of what could have been a simple "range must be reserved" > statement. > > > > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6 > > > > Memory Type Usage after ExitBootServices()"? > > > > > > > > s/This memory is not used by the OS./This memory is not used by the OS, > > > > unless ACPI declares it for another purpose./ > > > > > > I guess the idea is that MCFG is a form of "ACPI declaring it"? I > > > don't have an explicit citation for it, but I infer at [1] that ACPI > > > static tables are second-class citizens and not intended as a way of > > > reserving address space because that would lead to problems booting > > > old OSes on firmware that provides new tables unknown to the OS. > > > > Ah, true, certainly for new stuff, but what about MCFG specifically? > > What harm is there an assuming that MMCONFIG intersecting with > > EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage. > > Probably none, and I think that's what we'll have to do. Ugh. > Another random special-case rule. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32 I am still holding out that a BIOS developer can either say "whoops, populating MMCONFIG in _CRS was overlooked", or point out "if you take the derivative of the PCI spec, multiply it be the inverse of the EFI spec and then take the cross-product with the ACPI spec then the memory type comes out as implicitly reserved".