Re: [PATCH V14 4/7] PCI: loongson: Don't access non-existant devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2022/6/29 上午12:04, Bjorn Helgaas wrote:
On Tue, Jun 28, 2022 at 09:03:02PM +0800, Jianmin Lv wrote:
On 2022/6/28 上午5:38, Bjorn Helgaas wrote:
On Fri, Jun 17, 2022 at 03:43:27PM +0800, Huacai Chen wrote:
On LS2K/LS7A, some non-existant devices don't return 0xffffffff when
scanning. This is a hardware flaw but we can only avoid it by software
now.

We should say what *does* happen if we do a config read to a device
that doesn't exit.  Machine check, hang, etc?

The device is a hidden device(only for debug) that should not be
scanned. If scanned in a non-normal way, the machine is hang(one
case in ltp pci test can trigger the issue, which is explained
below).

Reading the Vendor ID is the *normal* way to scan for a device.  It
seems that this hardware just hangs in some cases when the device
doesn't exist.

Generally speaking we only probe for functions > 0 if .0 is marked as
multi-function, so I guess this means 00:09.0 is marked as a
multi-function device, but config reads to 00:09.1 would fail?

Yes, definitely. Actually, the 00:09.0 is a single device, so fun1(09.1)
will not be scanned(e.g. the fun1 will be not scanned on pci enumeration
during kernel booting).

But, there is one situation: when running ltp pci test case on LS7A,
the 00:08.2 is a sata controller(a valid device), and the bus number(0)
and devfn(0x42) are inputted to kernel api pci_scan_slot(), which has
clear note: devfn must have zero function. So, apparently, the inputted
devfn's function is not zero, but 2, and then in the pci_scan_slot():

         for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn))
{
                 dev = pci_scan_single_device(bus, devfn + fn);
                 ...
         }

08.2,08.3...and 09.1 will be scanned one by one, so the 09.1(fun1) is
scanned.

Does the "((bus == 0) && (device >= 9 && device <= 20) && (function > 0))"
test catch *all* devfns where the hang occurs?  I wouldn't want to
only avoid the ones that LTP happens to use.  If we did that, a future
LTP change could easily break things again.  But I assume you know
exactly what devices are present on the root bus.


Yes, as you said, I'm sure that only these hidden functions(fun1 of dev 9 to 20) on root bus can cause issue, so this fix is enough to address it.

-	if (priv->data->flags & FLAG_DEV_FIX &&
-			!pci_is_root_bus(bus) && PCI_SLOT(devfn) > 0)
+	if ((priv->data->flags & FLAG_DEV_FIX) && bus->self) {
+		if (!pci_is_root_bus(bus) && (device > 0))
+			return NULL;
+	}
+
+	/* Don't access non-existant devices */
+	if (!pdev_is_existant(busnum, device, function))
   		return NULL;

Is this a "forever" hardware bug that will never be fixed, or should
there be a flag like FLAG_DEV_FIX so we only do this on the broken
devices?

No, the next new version LS7A will correct it, so maybe we can use
FLAG_DEV_FIX-like to address it.

You should add the flag now instead of waiting for the new hardware.
Otherwise you may not remember or notice the need to make this
conditional on the hardware version, you'll wonder why the fixed
hardware doesn't enumerate devices correctly.


Thanks for your suggestion, I agree that, Huacai, WDYT?


Bjorn





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux