Re: [PATCH V14 4/7] PCI: loongson: Don't access non-existant devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 28, 2022 at 09:03:02PM +0800, Jianmin Lv wrote:
> On 2022/6/28 上午5:38, Bjorn Helgaas wrote:
> > On Fri, Jun 17, 2022 at 03:43:27PM +0800, Huacai Chen wrote:
> > > On LS2K/LS7A, some non-existant devices don't return 0xffffffff when
> > > scanning. This is a hardware flaw but we can only avoid it by software
> > > now.
> > 
> > We should say what *does* happen if we do a config read to a device
> > that doesn't exit.  Machine check, hang, etc?
> 
> The device is a hidden device(only for debug) that should not be
> scanned. If scanned in a non-normal way, the machine is hang(one
> case in ltp pci test can trigger the issue, which is explained
> below).

Reading the Vendor ID is the *normal* way to scan for a device.  It
seems that this hardware just hangs in some cases when the device
doesn't exist.

> > Generally speaking we only probe for functions > 0 if .0 is marked as
> > multi-function, so I guess this means 00:09.0 is marked as a
> > multi-function device, but config reads to 00:09.1 would fail?
> 
> Yes, definitely. Actually, the 00:09.0 is a single device, so fun1(09.1)
> will not be scanned(e.g. the fun1 will be not scanned on pci enumeration
> during kernel booting).
> 
> But, there is one situation: when running ltp pci test case on LS7A,
> the 00:08.2 is a sata controller(a valid device), and the bus number(0)
> and devfn(0x42) are inputted to kernel api pci_scan_slot(), which has
> clear note: devfn must have zero function. So, apparently, the inputted
> devfn's function is not zero, but 2, and then in the pci_scan_slot():
> 
>         for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn))
> {
>                 dev = pci_scan_single_device(bus, devfn + fn);
>                 ...
>         }
> 
> 08.2,08.3...and 09.1 will be scanned one by one, so the 09.1(fun1) is
> scanned.

Does the "((bus == 0) && (device >= 9 && device <= 20) && (function > 0))"
test catch *all* devfns where the hang occurs?  I wouldn't want to
only avoid the ones that LTP happens to use.  If we did that, a future
LTP change could easily break things again.  But I assume you know
exactly what devices are present on the root bus.

> > > -	if (priv->data->flags & FLAG_DEV_FIX &&
> > > -			!pci_is_root_bus(bus) && PCI_SLOT(devfn) > 0)
> > > +	if ((priv->data->flags & FLAG_DEV_FIX) && bus->self) {
> > > +		if (!pci_is_root_bus(bus) && (device > 0))
> > > +			return NULL;
> > > +	}
> > > +
> > > +	/* Don't access non-existant devices */
> > > +	if (!pdev_is_existant(busnum, device, function))
> > >   		return NULL;
> > 
> > Is this a "forever" hardware bug that will never be fixed, or should
> > there be a flag like FLAG_DEV_FIX so we only do this on the broken
> > devices?
> 
> No, the next new version LS7A will correct it, so maybe we can use
> FLAG_DEV_FIX-like to address it.

You should add the flag now instead of waiting for the new hardware.
Otherwise you may not remember or notice the need to make this
conditional on the hardware version, you'll wonder why the fixed
hardware doesn't enumerate devices correctly.

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux