Re: PCIe hotplug resource issues with PEX switch (NVMe disks) on AMD Epyc system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> >  From the logs, it looks like MMIO_PREF was assigned 1G but not MMIO.
> > 
> > This looks tricky. Please revert my commit:
> > c13704f5685deb7d6eb21e293233e0901ed77377
> > 
> > And see if it is the problem.
> 
> I reverted this patch and did a few test (some of my test cases). None
> turned out differently than before. Either the resources are not mapped
> completely or they are mapped  (with pci=nocrs) and not accessible.
> 
> > It is entirely possible, but because of
> > the very old code and how there are multiple passes, it might be
> > impossible to use realloc without side effects for somebody. If you fix
> > it for one scenario, it is possible that there is another scenario for
> > which it will break due to the change. The only way to make everything
> > work is a near complete rewrite of drivers/pci/setup-bus.c and
> > potentially others, something I am working on, but is going to take a
> > long time. And unlikely to ever be accepted.
> 
> While working on this issue, I looked (again) at this resource (re-)
> allocation code. This is really confusing (at least to me) and I also think
> that it needs a "near complete rewrite".
> > Otherwise, it will take me a lot of grepping through dmesg to find the
> > cause, which will take more time.
> 
> Sure.
> > FYI, "lspci -vvv" is redundant because it can be produced from "lspci
> > -xxxx" output.
> 
> I know. Its mainly for me to easily see the PCI devices listed quickly.
> > A final note, Epyc CPUs can bifurcate x16 slots into x4/x4/x4/x4 in the
> > BIOS setup, although you will probably not have the hotplug services
> > provided by the PEX switch.
> 
> I think it should not matter for my current test with resource assignment,
> how many PCIe lanes the PEX switch has connected to the PCI root port. Its
> of course important for the bandwidth, but this is a completely different
> issue.
I meant that you can connect 4x NVMe drives to a PCIe x16 slot with a 
cheap passive bifurcation riser. But it sounds like this card is useful 
because of its hotplug support.

I noticed if you grep your some of your dmesg logs for "add_size", you 
have some lines like this:
[    0.767652] pci 0000:42:04.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 44] add_size 200000 add_align 100000

I am not sure if these are the cause or a symptom of the problem, but I 
do not have any when assigning MMIO and MMIO_PREF for Thunderbolt 3.

I noticed you are using pci=hpmemsize in some of the tests. It should 
not be interfering because you put it first (it is overwritten by 
hpmmiosize and hpmmioprefsize). But I should point out that 
pci=hpmemsize=X is equivalent to pci=hpmmiosize=X,hpmmioprefsize=X so it 
is redundant. When I added hpmmiosize and hpmmioprefsize parameters to 
control them independently, I would have liked to have dropped 
hpmemsize, but needed to leave it around to not disrupt people who are 
already using it.

Please try something like this, which I dug up from a very old attempt 
to overhaul drivers/pci/setup-bus.c that I was working on. It will 
release all the boot resources before the initial allocation, and should 
give the system a chance to cleanly assign all resources on the first 
pass / try. The allocation code works well until you use more than one 
pass - then things get very hairy. I just applied it to mine, and now
everything applies the first pass, with not a single failure to assign.

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 22aed6cdb..befaef6a8 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1822,8 +1822,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 void __init pci_assign_unassigned_resources(void)
 {
 	struct pci_bus *root_bus;
+	struct pci_dev *dev;
 
 	list_for_each_entry(root_bus, &pci_root_buses, node) {
+		for_each_pci_bridge(dev, root_bus) {
+			pci_bridge_release_resources(dev->subordinate, IORESOURCE_IO);
+			pci_bridge_release_resources(dev->subordinate, IORESOURCE_MEM);
+			pci_bridge_release_resources(dev->subordinate, IORESOURCE_MEM_64);
+			pci_bridge_release_resources(dev->subordinate, IORESOURCE_MEM_64 | IORESOURCE_PREFETCH);
+		}
+
 		pci_assign_unassigned_root_bus_resources(root_bus);
 
 		/* Make sure the root bridge has a companion ACPI device */

Kind regards,
Nicholas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux