On 08.02.2021 20:47, Rahul Lakkireddy wrote: > On Friday, February 02/05/21, 2021 at 23:31:24 +0100, Heiner Kallweit wrote: >> On 05.02.2021 22:46, Bjorn Helgaas wrote: >>> [+cc Casey, Rahul] >>> >>> On Fri, Feb 05, 2021 at 08:29:45PM +0100, Heiner Kallweit wrote: >>>> cxgb4 uses the full VPD address space for accessing its EEPROM (with some >>>> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the >>>> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE). >>>> Having official (structured) and inofficial (unstructured) VPD data >>>> violates the PCI spec, let's set VPD len according to all data that can be >>>> accessed via PCI VPD access, no matter of its structure. >>> >>> s/inofficial/unofficial/ >>> >>>> Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx> >>>> --- >>>> drivers/pci/vpd.c | 7 +++---- >>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c >>>> index 7915d10f9..06a7954d0 100644 >>>> --- a/drivers/pci/vpd.c >>>> +++ b/drivers/pci/vpd.c >>>> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev) >>>> /* >>>> * If this is a T3-based adapter, there's a 1KB VPD area at offset >>>> * 0xc00 which contains the preferred VPD values. If this is a T4 or >>>> - * later based adapter, the special VPD is at offset 0x400 for the >>>> - * Physical Functions (the SR-IOV Virtual Functions have no VPD >>>> - * Capabilities). The PCI VPD Access core routines will normally >>>> + * later based adapter, provide access to the full virtual EEPROM >>>> + * address space. The PCI VPD Access core routines will normally >>>> * compute the size of the VPD by parsing the VPD Data Structure at >>>> * offset 0x000. This will result in silent failures when attempting >>>> * to accesses these other VPD areas which are beyond those computed >>>> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev) >>>> if (chip == 0x0 && prod >= 0x20) >>>> pci_set_vpd_size(dev, 8192); >>>> else if (chip >= 0x4 && func < 0x8) >>>> - pci_set_vpd_size(dev, 2048); >>>> + pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE); >>> >>> This code was added by 7dcf688d4c78 ("PCI/cxgb4: Extend T3 PCI quirk >>> to T4+ devices") [1]. Unfortunately that commit doesn't really have >>> the details about what it fixes, other than the silent failures it >>> mentions in the comment. >>> >>> Some devices hang if we try to read at the wrong VPD address, and this >>> can be done via the sysfs "vpd" file. Can you expand the commit log >>> with an argument for why it is always safe to set the size to >>> PCI_VPD_MAX_SIZE for these devices? >>> >> >> Seeing t4_eeprom_ptov() there is data at the end of the VPD address >> space, but there may be gaps in between. I don't have test hw, >> therefore it would be good if Chelsio could confirm that accessing >> any address in the VPD address space (32K) is ok. If a VPD address >> isn't backed by EEPROM, it should return 0x00 or 0xff, and not hang >> the device. >> > > We've tested the patches on T5 adapter. Although there are no crashes > seen, the 32K VPD read from sysfs at certain chunks are getting wrapped > around and overwritten. We're still analyzing this. > >>> The fact that cudbg_collect_vpd_data() fiddles around with >>> pci_set_vpd_size() suggests to me that there is *some* problem with >>> reading parts of the VPD. Otherwise, why would they bother? >>> >>> 940c9c458866 ("cxgb4: collect vpd info directly from hardware") [2] >>> added the pci_set_vpd_size() usage, but doesn't say why it's needed. >>> Maybe Rahul will remember? >>> > > If firmware has crashed, then it's not possible to collect the VPD info > from firmware. So, the VPD info is fetched from EEPROM instead, which > is unfortunately outside the VPD size of the PF. > >> >> In addition we have cb92148b58a4 ("PCI: Add pci_set_vpd_size() to set >> VPD size"). To me it seems the VPD size quirks and this commit >> try to achieve the same: allow to override the autodetected VPD len >> >> The quirk mechanism is well established, and if possible I'd like >> to get rid of pci_set_vpd_size(). I don't like the idea that the >> PCI core exposes API calls for accessing a proprietary VPD data >> format of one specific vendor (cxgb4 is the only user of >> pci_set_vpd_size()). >> > > There seems to be a way to get the Serial Configuration Version from > some internal registers. I will send the patch soon. It should remove > the call to pci_set_vpd_size() from cudbg_lib.c. > Great, this would be the best solution. Then we can get rid of pci_set_vpd_size() w/o impact on what is exposed via sysfs. So for now I'll drop patches 2 and 3 and will resend patch 1 as a single patch. > Thanks, > Rahul > Heiner >>> Bjorn >>> >>> [1] https://git.kernel.org/linus/7dcf688d4c78 >>> [2] https://git.kernel.org/linus/940c9c458866 >>> >>>> } >>>> >>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, >>>> -- >>>> 2.30.0 >>>> >>>> >>>> >>