Re: Multitude of resource assignment functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 02, 2019 at 04:39:51PM -0500, Bjorn Helgaas wrote:
> On Sun, Jun 30, 2019 at 02:57:37AM +0000, Nicholas Johnson wrote:
> 
> > - Should pci=noacpi imply pci=nocrs? It does not appear to, and I feel 
> > like it should, as CRS is part of ACPI and relates to PCI.
> 
> "pci=noacpi" means "Do not use ACPI for IRQ routing or for PCI
> scanning."
> 
> "pci=nocrs" means "Ignore PCI host bridge windows from ACPI."  If we
> ignore _CRS, we have no idea what the PCI host bridge apertures are,
> so we cannot allocate resources for devices on the root bus.
But I use pci=nocrs (it is non-negotiable for assigning massive 
MMIO_PREF with kernel parameters) and it does work. If I use pci=nocrs, 
then the whole physical address range of the CPU goes to the root 
complex (for example, 39-bit physical address lines on quad-core Intel 
is 512G). I am guessing that the OS makes sure that when assigning root 
port windows, we do not clobber the physical RAM so that any RAM 
addresses pass straight through the root complex. I have never had funny 
crashes that would make me think I have clobbered the RAM with nocrs. If 
I push the limits then it fails to assign root port resources as 
expected. Usually I assign 64G size to each Thunderbolt port for total 
of 256G over four ports. It is total overkill but it gives me 
satisfaction to know that the firmware is definitely not in control and 
that if it is needed, it can be requested. For a production system, I 
would likely tone it down a little.

> 
> The "Do not use ACPI for ... PCI scanning" part indeed does suggest
> that "pci=noacpi" could imply "pci=nocrs", but I don't think there's
> anything to be gained by changing that now.
> 
> We probably *should* remove "or for PCI scanning" from the
> documentation, because "pci=noacpi" only affects IRQs.
> 
> The only reason these exist at all is as a debugging aid to
> temporarily work around issues in firmware or Linux until we can
> develop a real fix or quirk that works without the user specifying a
> kernel parameter.
> 
> > - Does anybody know why with pci=noacpi, you get dmesg warnings about 
> > cannot find PCI int A mapping - but they do not seem to cause the 
> > devices any issues in functioning? Is it because they are using MSI?
> 
> I doubt it.  I think you're just lucky.  In general the information
> from _PRT and _CRS is essential for correct operation.
Strange, because there are dozens of these warnings on multiple 
computers and heaps of devices on Thunderbolt. If the BARs are assigned 
then they work, every time, no questions asked. Maybe this suggests that 
Thunderbolt is somehow exempt. Perhaps the controller has kept 
configuration from the firmware setup and everything behind it does not 
care.

> 
> > - Does pci=ignorefw sound good for a future proposal?
> 
> No, at least not without more description of what this would
> accomplish.
I have not given it much time and thought but basically it will be 
something that can be added to incrementally. I would start with it 
implying nocrs and releasing all root complex resources at boot before 
the initial scan. That way we can see if the particular platform cares 
if we do everything in the kernel.

> 
> It sounds like you would want this to turn off _PRT, _CRS, and other
> information from ACPI.  You may not like ACPI, but that information is
> there for good reason, and if we didn't get it from ACPI we would have
> to get it from somewhere else.
The nocrs is vital because the BIOS places pitiful space behind the root 
complex and will fail for assigning large BARs - hence why Xeon Phi 
coprocessors with 8G or 16G BARs to map their whole RAM are only 
supported on certain systems. I consider all BIOS / firmware to be 
broken at this time, especially with most still catering for 32-bit OS 
that almost nobody uses. I know not everybody feels that way, but I am 
an idealist and aim to move things in the right direction.

I would accept ACPI if it were just a collection of tables, memory 
mapped like MMCONFIG. I know there are more complicated things that 
require bytecode to run (although I do assert my belief that it should 
be avoided if possible) but if the static tables were moved out of ACPI 
then in my mind, it would be progress.

Is there a reason why PCI SIG could not add a future extension where all 
of this information can be accessed with an extended MMCONFIG address 
range?

> 
> There is always "acpi=off" if you just don't want ACPI at all.
> 
> Bjorn
I am aware, and I will happily use that when there is a way to manually 
specify DMAR and MADT information. If you use acpi=off presently, you 
lose all but one CPU core and the use of IOMMU. There used to be acpi=ht 
to disable ACPI for everything except for HyperThreading, but that was 
removed a long time ago - I do not know why.

The reason I often test like this is because it gives me reassurance 
that my code is not working by fluke on the particular system because of 
a firmware quirk. Also, Thunderbolt was deeply entrenched in ACPI 
before, so I am kind of over-compensating to make sure that there is no 
longer any unconditional dependency.

Nicholas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux