Re: [RFC PATCH 1/3] PCI: Wait for device readiness with Configuration RRS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Stanislav, Rajat]

On Wed, Aug 28, 2024 at 06:17:04AM +0200, Lukas Wunner wrote:
> On Tue, Aug 27, 2024 at 06:48:46PM -0500, Bjorn Helgaas wrote:
> > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> >  			return -ENOTTY;
> >  		}
> >  
> > -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> > -		if (!PCI_POSSIBLE_ERROR(id))
> > -			break;
> > +		if (root && root->config_crs_sv) {
> > +			pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > +			if (!pci_bus_crs_vendor_id(id))
> > +				break;
> 
> There was an effort from Amazon back in 2020/2021 to improve CRS support:
> 
> https://lore.kernel.org/linux-pci/20200307172044.29645-1-stanspas@xxxxxxxxxx/

Thanks for reminding me of that, and I'm sorry that series didn't get
applied back then because it's very similar to this one.

> One suggestion you raised in the subsequent discussion was to use a
> 16-bit (word) instead of a 32-bit (dword) read of the Vendor ID
> register to avoid issues with devices that don't implement CRS SV
> correctly:
> 
> https://lore.kernel.org/linux-pci/20210913160745.GA1329939@bjorn-Precision-5520/

Thanks.  Reading that response, I don't understand my point about using
a 16-bit read.  I mentioned 89665a6a7140 ("PCI: Check only the Vendor
ID to identify Configuration Request Retry"), the commit log of which
points to http://lkml.kernel.org/r/4729FC36.3040000@xxxxxxxxx, which
documents several defective devices that have a Vendor ID of 0x0001.

E.g., the Realtek rtl8169 controller mentioned there has Vendor/Device
ID of [0001:8168].  Doing either a 16-bit read or a 32-bit read and
checking the low 16 bits would incorrectly treat that as a Config RRS
completion.

So it *looks* to me like we will time out after 60 seconds and
conclude the device never became ready:

  pci_scan_device(..., timeout=60*1000)
    pci_bus_read_dev_vendor_id
      pci_bus_generic_read_dev_vendor_id
        pci_bus_read_config_dword(PCI_VENDOR_ID, l)  # <--
        # *l == 0x81680001
        if (pci_bus_crs_vendor_id(*l))        # 0x81680001 & 0xffff = 0x0001
          pci_bus_wait_crs(..., timeout=60*1000)
            while (pci_bus_crs_vendor_id(*l)) {
              if (delay > timeout)
                return false;                 # device not ready
              pci_bus_read_config_dword(PCI_VENDOR_ID, l)
            }

That *might* be an argument for doing a 32-bit read and checking for
0xffff0001, since the spec requires all 1's in the extra bytes.  But
I'm not aware of any complaints about these broken devices with a
0x0001 Vendor ID, and the 89665a6a7140 commit log says there are also
defective devices that don't fill the extra bytes with all 1's.

So my inclination is to keep the current code that does a 32-bit read
and checks only the low 16 bits.

> I realize that pci_bus_crs_vendor_id() masks the Device ID bits,
> so probably no biggie.  Just want to make sure all lessons learned
> during previous discussions on this topic are considered. :)





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux