Re: 4.6-rc2 regression with commit 104daa71b396: check VPD access offset against length

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/12/2016 07:37 AM, Bjorn Helgaas wrote:
> On Mon, Apr 11, 2016 at 01:16:17PM -0500, Steve Wise wrote:
>>> kernel.org commit 104daa71b396 added a check to make sure that efforts to
>>> read/write the VPD wouldn't extend past the computed length of the VPD.
>>> Later, kernel.org commit 408641e93aa5 folded the pci_vpd_pci22 into
>>> struct pci_vpd so things moved around a bit after that and an error return
>>> got changed into a silent failure instead of -EINVAL.
>>>
>>>   The problem is that the previous pci_vpd_pci22_read() didn't check for a
>> read with
>>> a VPD Offset > VPD Length and the new pci_vpd_read() is checking that.  Worse
>>> yet, when a VPD Offset is greater than the recorded VPD Length, it simply
>>> returns 0 rather than -EINVAL.
>>>
>>>   The problem is stemming from the fact that the Chelsio adapters actually
>> have
>>> two VPD structures stored in the VPD.  An abbreviated on at Offset 0x0 and the
>>> complete VPD at Offset 0x400.  The abbreviated one only contains the PN, SN
>> and
>>> EC Keywords, while the complete VPD contains those plus various adapter
>>> constants contained in V0, V1, etc.  And it also contains the Base Ethernet
>> MAC
>>> Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact
>>> the adapter firmware.  (We don't have the "NA" Keywork in the VPD Structure at
>>> Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0
>>> specification.)
>>>
>>>   With the new code, the computed size of the VPD is 0x200 and so our efforts
>>> to read the VPD at Offset 0x400 silently fails.  We check the result of the
>>> read looking for a signature 0x82 byte but we're checking against random stack
>>> garbage.
>>>
>>>   The end result is that the cxgb4 driver now fails the PCI-E Probe.
>>>
>>
>> Silently failing is wrong, in my opinion.  And I even question truncating which
>> is also done in pci_vpd_read().  To the PCI maintainers:  Should the length
>> checks just be removed?    If not, what is the correct solution?  Adding a
>> different "expert" API that ignores the length checks, or somehow allowing the
>> device driver to set the actual VPD size? 
> 
> I think everybody would prefer if it the kernel could just read
> whatever VPD region the user requested, without parsing the data or
> checking for length (as long as we're within the 32K space allowed by
> the spec).
> 
> The problem is that some cards crash if you read too much:
> 
>   commit 104daa71b396
>   Author: Hannes Reinecke <hare@xxxxxxx>
>   Date:   Mon Feb 15 09:42:01 2016 +0100
> 
>     PCI: Determine actual VPD size on first access
>     
>     PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
>     smaller than that.  To figure out the actual size one has to read the VPD
>     area until the 'end marker' is reached.
>     
>     Per spec, reading outside of the VPD space is "not allowed."  In practice,
>     it may cause simple read errors or even crash the card.  To make matters
>     worse not every PCI card implements this properly, leaving us with no 'end'
>     marker or even completely invalid data.
>     
>     Try to determine the size of the VPD data when it's first accessed.  If no
>     valid data can be read an I/O error will be returned when reading or
>     writing the sysfs attribute.
> 
> So if you want to get rid of the length checks, you have to propose
> some other mechanism to avoid these issues.
> 
> The only ideas I have are to (1) parse the data as we do in
> 104daa71b396, (2) add quirks to prevent VPD access (as in
> 7c20078a8197 ("PCI: Prevent VPD access for buggy devices"), and/or (3)
> add quirks to allow access to more VPD than parsing says we can
> access.  These aren't mutually exclusive -- we already have (1) and
> (2), and I think we could easily add (3) into the mix.
> 
> (3) seems like a possible solution for Chelsio.  In that case, it's
> the driver that needs the data, so the driver could maintain a quirk.
> 
That's my suggestion, too.
The generic code should be handling things according to the standard.
If other drivers require a different handling we should be adding a
quirk for them.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@xxxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux