On Tue, Jul 02, 2019 at 04:39:51PM -0500, Bjorn Helgaas wrote: > On Sun, Jun 30, 2019 at 02:57:37AM +0000, Nicholas Johnson wrote: > > > - Should pci=noacpi imply pci=nocrs? It does not appear to, and I feel > > like it should, as CRS is part of ACPI and relates to PCI. > > "pci=noacpi" means "Do not use ACPI for IRQ routing or for PCI > scanning." > > "pci=nocrs" means "Ignore PCI host bridge windows from ACPI." If we > ignore _CRS, we have no idea what the PCI host bridge apertures are, > so we cannot allocate resources for devices on the root bus. But I use pci=nocrs (it is non-negotiable for assigning massive MMIO_PREF with kernel parameters) and it does work. If I use pci=nocrs, then the whole physical address range of the CPU goes to the root complex (for example, 39-bit physical address lines on quad-core Intel is 512G). I am guessing that the OS makes sure that when assigning root port windows, we do not clobber the physical RAM so that any RAM addresses pass straight through the root complex. I have never had funny crashes that would make me think I have clobbered the RAM with nocrs. If I push the limits then it fails to assign root port resources as expected. Usually I assign 64G size to each Thunderbolt port for total of 256G over four ports. It is total overkill but it gives me satisfaction to know that the firmware is definitely not in control and that if it is needed, it can be requested. For a production system, I would likely tone it down a little. > > The "Do not use ACPI for ... PCI scanning" part indeed does suggest > that "pci=noacpi" could imply "pci=nocrs", but I don't think there's > anything to be gained by changing that now. > > We probably *should* remove "or for PCI scanning" from the > documentation, because "pci=noacpi" only affects IRQs. > > The only reason these exist at all is as a debugging aid to > temporarily work around issues in firmware or Linux until we can > develop a real fix or quirk that works without the user specifying a > kernel parameter. > > > - Does anybody know why with pci=noacpi, you get dmesg warnings about > > cannot find PCI int A mapping - but they do not seem to cause the > > devices any issues in functioning? Is it because they are using MSI? > > I doubt it. I think you're just lucky. In general the information > from _PRT and _CRS is essential for correct operation. Strange, because there are dozens of these warnings on multiple computers and heaps of devices on Thunderbolt. If the BARs are assigned then they work, every time, no questions asked. Maybe this suggests that Thunderbolt is somehow exempt. Perhaps the controller has kept configuration from the firmware setup and everything behind it does not care. > > > - Does pci=ignorefw sound good for a future proposal? > > No, at least not without more description of what this would > accomplish. I have not given it much time and thought but basically it will be something that can be added to incrementally. I would start with it implying nocrs and releasing all root complex resources at boot before the initial scan. That way we can see if the particular platform cares if we do everything in the kernel. > > It sounds like you would want this to turn off _PRT, _CRS, and other > information from ACPI. You may not like ACPI, but that information is > there for good reason, and if we didn't get it from ACPI we would have > to get it from somewhere else. The nocrs is vital because the BIOS places pitiful space behind the root complex and will fail for assigning large BARs - hence why Xeon Phi coprocessors with 8G or 16G BARs to map their whole RAM are only supported on certain systems. I consider all BIOS / firmware to be broken at this time, especially with most still catering for 32-bit OS that almost nobody uses. I know not everybody feels that way, but I am an idealist and aim to move things in the right direction. I would accept ACPI if it were just a collection of tables, memory mapped like MMCONFIG. I know there are more complicated things that require bytecode to run (although I do assert my belief that it should be avoided if possible) but if the static tables were moved out of ACPI then in my mind, it would be progress. Is there a reason why PCI SIG could not add a future extension where all of this information can be accessed with an extended MMCONFIG address range? > > There is always "acpi=off" if you just don't want ACPI at all. > > Bjorn I am aware, and I will happily use that when there is a way to manually specify DMAR and MADT information. If you use acpi=off presently, you lose all but one CPU core and the use of IOMMU. There used to be acpi=ht to disable ACPI for everything except for HyperThreading, but that was removed a long time ago - I do not know why. The reason I often test like this is because it gives me reassurance that my code is not working by fluke on the particular system because of a firmware quirk. Also, Thunderbolt was deeply entrenched in ACPI before, so I am kind of over-compensating to make sure that there is no longer any unconditional dependency. Nicholas