RE: [PATCH v1] pci: Limit VPD length of Emulex adapters to the actual length supported.

Venkat Duvvuru <VenkatKumar.Duvvuru@xxxxxxxxxx> · Mon, 3 Nov 2014 12:18:13 +0000

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx]
> Sent: Thursday, October 30, 2014 9:03 PM
> To: Venkat Duvvuru
> Cc: linux-pci@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v1] pci: Limit VPD length of Emulex adapters to the actual
> length supported.
> 
> On Thu, Oct 30, 2014 at 7:38 AM, Venkat Duvvuru
> <VenkatKumar.Duvvuru@xxxxxxxxxx> wrote:
> > Hi Bjorn,
> > Please find my comments inline.
> >
> >> -----Original Message-----
> >> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx]
> >> Sent: Thursday, October 23, 2014 9:11 PM
> >> To: Venkat Duvvuru
> >> Cc: linux-pci@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH v1] pci: Limit VPD length of Emulex adapters to the
> actual
> >> length supported.
> >>
> >> On Thu, Oct 16, 2014 at 02:16:42PM +0530, Venkat Duvvuru wrote:
> >> > By default pci utilities/subsystem tries to read 32k bytes of vpd data no
> >> matter
> >> > what the device supports. This can lead to unexpected behavior depending
> >> > on how each of the devices handle this condition. This patch fixes the
> >> > problem for Emulex adapter family.
> >> >
> >> > v1:
> >> > Addressed Bjorn's comments
> >> > 1. Removed Vendor id and Device id macros from pci_ids.h and
> >> >    using the Vendor and Device id values directly in
> >> DECLARE_PCI_FIXUP_FINAL() lines.
> >> >
> >> > Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@xxxxxxxxxx>
> >>
> >> Hi Venkat,
> >>
> >> I'll merge this (in some form), but I'd like the changelog to include more
> >> details about what unexpected behavior occurs when reading too much
> data.
> >> This is to help people who trip over this problem find this patch as the
> >> solution.
> > [Venkat] "Timeout" happens on excessive VPD reads and  Kernel keeps
> logging the following message
> > "vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card
> vendor for a firmware update"
> >>
> >> In my opinion, this is a hardware defect, and I'd like to know what your
> >> hardware folks think, because I don't want to have to merge similar quirks
> >> for future devices.  Here's my reasoning:
> >>
> >> If a device doesn't implement the entire 32K of possible VPD space, I would
> >> expect the device to just return zeros or 0xff, or maybe alias the space by
> >> dropping the high-order unused address bits.
> > [Venkat] We do return 0xffs beyond the supported size but excessive VPD
> reads are causing timeouts when the adapter is handling some high priority
> work.
> 
> That makes it sounds like this is really an issue with how the adapter
> firmware manages the workload, not something strictly related to the
> size of implemented VPD space. In other words, it sounds like it's
> possible for the timeout to occur even when reading the space that
> *is* implemented.
In this case when the host reads 32k space, the adapter gets around 8K interrupts and sometimes gets overwhelmed with the interrupt storm. This could cause the adapter to stop functioning properly.
Limiting the VPD read to 1K causes only 256 interrupts (on the adapter) and the problem never seems to occur.
This has been the main motivation behind my patch.
I do agree that the timeout could still occur even when reading the 1K implemented space, but I feel it's highly improbable.

As an alternative solution, would you be open to a fix in PCI -core to stop reading after the End-tag is detected?  (This logic is used by pci-utility (ls-vpd.c) while reading VPD data.)
I now feel that this is the *right* solution than my pci-quirks patch.

> You say the kernel "keeps logging" the message.  From the code, it
> looks like it should only log it once per attempt to read or write the
> VPD.  Is that what you observe, or is there a problem where we don't
> abort the read/write after the first timeout, and we emit many
> messages?
Yes the kernel logs only for one time per attempt but there are configurations where we have many VFs per PF and we see this message for every VF and PF.

Thanks,
Venkat.
��.n��������+%������w��{.n�����{���"�)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥