[[ N.B. I'm trying to get subscribed to linux-pci@xxxxxxxxxxxxxxx but it's not going well. I started a conversation with Bjorn in order to give him a Heads Up and get the ball rolling on a regression in the Linux 4.6 kernel. Bjorn informed me that I can actually send email to the linix-pci email list without being subscribed, so here we go ... -- Casey]] ---------- From: Casey Leedom Sent: Friday, April 8, 2016 11:40 AM To: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Cc: Hariprasad S <hariprasad@xxxxxxxxxxx>; Subject: kernel.org merge of struct pci_vpd_pci22 into struct pci_vpd breaking cxgb4 Hi Bjorn, I'm trying to get myself subscribed to the linux-pci list in order to do this in the accepted fashion, but that seems to be stalling out for some reason, so I wanted to give you a Heads Up prior to me being able to submit this to that list. Your kernel.org commit 408641e93aa5283e586fefd4dc72e67c92aae075 to fold struct pci_vpd_pci22 into struct pci_vpd is breaking the cxgb4 driver. The problem is that the previous pci_vpd_pci22_read() didn't check for a read with a VPD Offset > VPD Length and the new pci_vpd_read() is checking that. Worse yet, when a VPD Offset is greater than the recorded VPD Length, it simply returns 0 rather than -EINVAL. The problem is stemming from the fact that the Chelsio adapters actually have two VPD structures stored in the VPD. An abbreviated on at Offset 0x0 and the complete VPD at Offset 0x400. The abbreviated one only contains the PN, SN and EC Keywords, while the complete VPD contains those plus various adapter constants contained in V0, V1, etc. And it also contains the Base Ethernet MAC Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact the adapter firmware. (We don't have the "NA" Keywork in the VPD Structure at Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0 specification.) With the new code, the computed size of the VPD is 0x200 and so our efforts to read the VPD at Offset 0x400 silently fails. We check the result of the read looking for a signature 0x82 byte but we're checking against random stack garbage. The end result is that the cxgb4 driver now fails the PCI-E Probe. Casey ---------- From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Sent: Friday, April 8, 2016 12:33 PDT To: Casey Leedom <leedom@xxxxxxxxxxx> Cc: Hariprasad S <hariprasad@xxxxxxxxxxx>; Hannes Reinecke <hare@xxxxxxxx>; [+cc Hannes] Hi Casey, First of all, I'm really sorry about this problem. I know it's a major hassle to track down breakages like this. My intent was that 408641e93aa5 ("PCI: Fold struct pci_vpd_pci22 into struct pci_vpd") would be pure reorganization, with no functional change. Did you actually bisect it to that and confirm that f1cd93f9aabe ("PCI: Rename VPD symbols to remove unnecessary "pci22"") works and 408641e93aa5 fails? >From your description, I would have thought the breakage would be caused by 104daa71b396 ("PCI: Determine actual VPD size on first access"), not by 408641e93aa5. The reason for 104daa71b396 is that some cards don't work well when reading outside the VPD area. I'm not sure how to fix that problem while keeping the Chelsio adapters working. Maybe we'll need some sort of quirk. Bjorn ---------- From: Casey Leedom <leedom@xxxxxxxxxxx> Sent: Friday, April 8, 2016 13:08 PDT To: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Cc: Hariprasad S <hariprasad@xxxxxxxxxxx>; Hannes Reinecke <hare@xxxxxxxx>; I'll post my note again and mention your hypothesis that it's really the check for the VPD size in commit 408641e93aa5. I hadn't seen that the old pci_vpd_pci22_read() checked the length ... Casey-- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html