Re: PCIe hotplug resource issues with PEX switch (NVMe disks) on AMD Epyc system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 13, 2019 at 09:35:19AM +0100, Stefan Roese wrote:
> Hi!
Hi,
> 
> I am facing an issue with PCIe-Hotplug on an AMD Epyc based system.
> Our system is equipped with an HBA for NVMe SSDs incl. PCIe switch
> (Supermicro AOC-SLG3-4E2P) [1] and we would like to be able to hotplug
> NVMe disks.
> 
> Currently, I'm testing with v5.5.0-rc1 and series [2] applied. Here
> a few tests and results that I did so far. All tests were done with
> one Intel NVMe SSD connected to one of the 4 NVMe ports of the HBA
> and the other 3 ports (currently) left unconnected:
> 
> a) Kernel Parameter "pci=pcie_bus_safe"
> The resources of the 3 unused PCIe slots of the PEX switch are not
> assigned in this test.
> 
> b) Kernel Parameter "pci=pcie_bus_safe,hpmemsize=0,hpiosize=0,hpmmiosize=1M,hpmmioprefsize=0"
> With this test I restricted the resources of the HP slots to the
> minimum. Still this results in unassigned resourced for the unused
> PCIe slots of the PEX switch.
> 
> c) Kernel Parameter "pci=realloc,pcie_bus_safe,hpmemsize=0,hpiosize=0,hpmmiosize=1M,hpmmioprefsize=0"
> Again, not all resources are assigned.
> 
> d) Kernel Parameter "pci=nocrs,realloc,pcie_bus_safe,hpmemsize=0,hpiosize=0,hpmmiosize=1M,hpmmioprefsize=0"
> Now all requested resources are available for the HP PCIe slots of the
> PEX switch. But the NVMe driver fails while probing. Debugging has
> shown, that reading from the BAR of the NVMe disk returns 0xffffffff.
> Also reading from the PLX PEX switch registers returns 0xfffffff in this
> case (this works of course without nocrs, when the BARs are mapped at
> a different address).
> 
> Does anybody have a clue on why the access to the PEX switch and / or
> the NVMe BAR does not work in the "nocrs" case? The BARs are located in
> the same window that is provided by the BIOS in the ACPI list (but is
> "ignored" in this case) [3].
> 
> Or if it is possible to get the HP resource mapping done correctly without
> setting "nocrs" for our setup with the PCIe/NVMe switch?
> 
> I can provide all sorts of logs (dmegs, lspci etc) if needed - just let
> me know.
> 
> Many thanks in advance,
> Stefan
This will be a quick response for now. I will get more in depth tonight 
when I have more time.

What I have taken away from this is:

1. Epyc -> Up to 4x PCIe Root Complexes, but from what I can gather, 
they are probably assigned on the same segment / domain, unfortunately, 
with non-overlapping bus numbers. Either way, multiple RCs may 
complicate using pci=nocrs and others. Unfortunately, I have not had the 
privilege of owning a system with multiple RCs, so I cannot be sure.

2. Not using Thunderbolt - [2] patch series only really makes a 
difference with nested hotplug bridges, such as in Thunderbolt. 
Although, it might help by not using additional resource lists, but I 
still do not think it will matter without nested hotplug bridges.

3. System not reallocating resources despite overridden -> is ACPI _DSM 
method evaluating to zero? I experienced this recently with an Intel Ice 
Lake system. I booted the laptop at the retail store into Linux off a 
USB to find out about the Thunderbolt implementation. I dumped "sudo 
lspci -xxxx" and dmesg and analysed the results at home. I noticed it 
did not override the resources, and from examining the source code, it 
likely evaluated _DSM to 0, which may have overridden pci=realloc. Try 
modifying the source code to unconditionally apply realloc in 
drivers/pci/setup-bus.c and see what happens. I have not bothered doing 
this myself and going back to the store to try to test this hypothesis.

4. It would be helpful if you attached full dmesg and "sudo lspci -xxxx" 
which dumps full PCI config, allowing us to run any lspci query as if we 
were on your system, from the file. I will be able to tell a lot more 
after seeing that. Possibly do one with no kernel parameters, and do 
another set of results with all of the kernel parameters. Use 
hpmmiosize=64M and hpmmioprefsize=1G for it to be noticeable, I reckon. 
But this will answer questions I have about which ports are hotplug 
bridges and other things.

5. There is a good chance it will not even boot since kernel since 
around ~v5.3 with acpi=off but it is worth a shot there, also. Since a 
recent kernel, I have found that acpi=off only removes HyperThreading, 
and not all the physical cores like it used to. So there must have been 
a patch which allowed it to guess the MADT table information. I have not 
investigated. But now, some of my computers crash upon loading the 
kernel with acpi=off. It must get it wrong at times. What about 
pci=noacpi instead?

Sorry if I missed something you said.

Best of luck, and I am interested into looking into this further. :)

Kind regards,
Nicholas Johnson

> 
> [1] https://www.supermicro.com/en/products/accessories/addon/AOC-SLG3-4E2P.php
> [2] https://lkml.org/lkml/2019/12/9/388
> [3]
> [    0.701932] acpi PNP0A08:00: host bridge window [io  0x0cf8-0x0cff] (ignored)
> [    0.701934] acpi PNP0A08:00: host bridge window [io  0x0000-0x02ff window] (ignored)
> [    0.701935] acpi PNP0A08:00: host bridge window [io  0x0300-0x03af window] (ignored)
> [    0.701936] acpi PNP0A08:00: host bridge window [io  0x03e0-0x0cf7 window] (ignored)
> [    0.701937] acpi PNP0A08:00: host bridge window [io  0x03b0-0x03df window] (ignored)
> [    0.701938] acpi PNP0A08:00: host bridge window [io  0x0d00-0x3fff window] (ignored)
> [    0.701939] acpi PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff window] (ignored)
> [    0.701939] acpi PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff window] (ignored)
> [    0.701940] acpi PNP0A08:00: host bridge window [mem 0xec000000-0xefffffff window] (ignored)
> [    0.701941] acpi PNP0A08:00: host bridge window [mem 0x182c8000000-0x1ffffffffff window] (ignored)
> ...
> 41:00.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0, IRQ 47, NUMA node 2
>         Memory at ec400000 (32-bit, non-prefetchable) [size=256K]
>         Bus: primary=41, secondary=42, subordinate=47, sec-latency=0
>         I/O behind bridge: None
>         Memory behind bridge: ec000000-ec3fffff [size=4M]
>         Prefetchable memory behind bridge: None
>         Capabilities: <access denied>
>         Kernel driver in use: pcieport
> epyc@epyc-Super-Server:~/stefan$ sudo ./memtool md 0xec400000+0x10
> ec400000: ffffffff ffffffff ffffffff ffffffff                ................



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux