>On 02/19/2016 03:07 PM, Jordan Hargrave wrote: >> On Fri, Feb 19, 2016 at 4:00 AM, Hannes Reinecke <hare@xxxxxxx> wrote: >>> >>> On 02/18/2016 09:04 PM, Jordan Hargrave wrote: >>>> The VPD-R is a readonly area of the PCI Vital Product Data region. >>>> There are some standard keywords for serial number, manufacturer, >>>> and vendor-specific values. Dell Servers use a vendor-specific >>>> tag to store number of ports and port mapping of partitioned NICs. >>>> >>>> info = VPD-Info string >>>> PN = Part Number >>>> SN = Serial Number >>>> MN = Manufacturer ID >>>> Vx = Vendor-specific (x=0..9 A..Z) >>>> >>>> This creates a sysfs subdirectory in the pci device: vpdattr with >>>> 'info', 'EC', 'SN', 'V0', etc. files containing the tag values. >>>> >>>> Signed-off-by: Jordan Hargrave <Jordan_Hargrave@xxxxxxxx> >>> Hmm. Can we first get an agreement on the PCI VPD parsing patches >>> I've posted earlier? >>> VPD parsing is really tricky, and we should aim on making the >>> read_vpd function robust enough before we begin putting things into >>> sysfs. >>> >>> Also, I'm not utterly keen on this patchset. >>> The sysfs space is blown up with tiny pieces of information, which >>> can easily gotten via lspci, too. >>> >>> Also, to my knowledge it's perfectly valid to _write_ to the VPD, in >>> which case the entire sysfs attribute setup would be invalided. >>> How do you propose to handle that? >>> >> >> This patch only reads the attributes from VPD-I and VPD-R areas, not >> the VPD-W (read write) area. >> The VPD-W data is located after the VPD-I and VPD-R area So nothing >> in these attributes should change. >> >Ah. Ok. > >> The main reason I want this is for replacing biosdevname (ethernet >> naming) functionality and getting the same functionality into the >> kernel and systemd. Systemd doesn't want to do vpd parsing, and >> reading the vpd can take a very long time on some devices, causing >> systemd to timeout. Another disadvantage of it being in userspace >> is for devices using SR-IOV. In those devices the vpd only >> exists for the physfn devices but not the virtual devices. A >> userspace program device will have to read the entire VPD for >> each physical and virtual PCI device. >> >> Logic is something like this: >> if (open("/sys/bus/pci/devices/X/physfn/vpd", O_RDONLY) < 0) >> if (open("/sys/bus/pci/devices/X/vpd", O_RDONLY) < 0) >> return; >> } >> parsevpd(fd); >> >> Specifically it is parsing one of the Vx attributes for a 'DCM' or >> 'DC2' string that contain a mapping from >> NIC ports and partitions to PCI device >> >Well, unfortunately you just gave a very good reason to _not_ >include this into the kernel: The delay isn't a huge amount on any of the devices I've seen. The Mellanox cards I have are the slowest. Here's some timing tests I've done, using this patch vs a readvpd utility. I also compared it with lspci. @@@ Read individual Broadcom (time ./readvpd 0000:01:00.0 > /dev/null) &>>log real 0m0.003s user 0m0.002s sys 0m0.002s @@@ Read individual Mellanox (time ./readvpd 0000:04:00.0 > /dev/null) &>>log real 0m0.071s user 0m0.001s sys 0m0.070s @@@ Read individual Broadcom using lspci (time lspci -vvv -s 0000:01:00.0 > /dev/null) &>>log real 0m0.036s user 0m0.017s sys 0m0.019s @@@ Read individual Mellanox using lspci (time lspci -vvv -s 0000:04:00.0 > /dev/null) &>>log real 0m1.213s <--- SLOW!!!! user 0m0.012s sys 0m1.201s @@@ Read each network device with 'real' VPD. This should be equivalent to the boot time delay, at least for network devices with VPD (time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); SBDF=$(basename $PF) ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; ./readvpd $SBDF > /dev/null; fi ; done) &>> log ==== /sys/class/net/eno1/device ==== /sys/class/net/eno2/device ==== /sys/class/net/eno3/device ==== /sys/class/net/eno4/device ==== /sys/class/net/eno5/device ==== /sys/class/net/eno6/device ==== /sys/class/net/enp4s0d1/device ==== /sys/class/net/enp4s0/device real 0m0.319s user 0m0.033s sys 0m0.295s @@@ Read each network device, including SR-IOV (time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); if [ -e $PF/physfn ] ; then PF=$(readlink -f $PF/physfn) ; fi ; SBDF=$(basename $PF) ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; ./readvpd $SBDF > /dev/null; fi ; done) &>> log ==== /sys/class/net/eno1/device ==== /sys/class/net/eno2/device ==== /sys/class/net/eno3/device ==== /sys/class/net/eno4/device ==== /sys/class/net/eno5/device ==== /sys/class/net/eno6/device ==== /sys/class/net/enp4s0d1/device ==== /sys/class/net/enp4s0/device ==== /sys/class/net/enp4s0f1d1/device (SR-IOV) ==== /sys/class/net/enp4s0f1/device (SR-IOV) ==== /sys/class/net/enp4s0f2d1/device (SR-IOV) ==== /sys/class/net/enp4s0f2/device (SR-IOV) ==== /sys/class/net/enp4s0f3d1/device (SR-IOV) ==== /sys/class/net/enp4s0f3/device (SR-IOV) ==== /sys/class/net/enp4s0f4d1/device (SR-IOV) ==== /sys/class/net/enp4s0f4/device (SR-IOV) ==== /sys/class/net/enp4s0f5d1/device (SR-IOV) ==== /sys/class/net/enp4s0f5/device (SR-IOV) ==== /sys/class/net/enp4s0f6d1/device (SR-IOV) ==== /sys/class/net/enp4s0f6/device (SR-IOV) ==== /sys/class/net/enp4s0f7d1/device (SR-IOV) ==== /sys/class/net/enp4s0f7/device (SR-IOV) ==== /sys/class/net/enp4s1d1/device (SR-IOV) ==== /sys/class/net/enp4s1/device (SR-IOV) real 0m1.449s user 0m0.047s sys 0m1.412s This is much slower as it has to re-read/parse the VPD data for each SR-IOV device By contrast, here is using cached kernel entries (including virtual devices) (time for X in /sys/class/net/*/device ; do PF=$(readlink -f $X); if [ -e $PF/physfn ] ; then PF=$(readlink -f $PF/physfn) ; fi ; if [ -e $PF/vpdattr ] ; then echo ==== $X ; cat $PF/vpdattr/* > /dev/null; fi ; done) &> log ==== /sys/class/net/eno1/device ==== /sys/class/net/eno2/device ==== /sys/class/net/eno3/device ==== /sys/class/net/eno4/device ==== /sys/class/net/eno5/device ==== /sys/class/net/eno6/device ==== /sys/class/net/enp4s0d1/device ==== /sys/class/net/enp4s0/device ==== /sys/class/net/enp4s0f1d1/device ==== /sys/class/net/enp4s0f1/device ==== /sys/class/net/enp4s0f2d1/device ==== /sys/class/net/enp4s0f2/device ==== /sys/class/net/enp4s0f3d1/device ==== /sys/class/net/enp4s0f3/device ==== /sys/class/net/enp4s0f4d1/device ==== /sys/class/net/enp4s0f4/device ==== /sys/class/net/enp4s0f5d1/device ==== /sys/class/net/enp4s0f5/device ==== /sys/class/net/enp4s0f6d1/device ==== /sys/class/net/enp4s0f6/device ==== /sys/class/net/enp4s0f7d1/device ==== /sys/class/net/enp4s0f7/device ==== /sys/class/net/enp4s1d1/device ==== /sys/class/net/enp4s1/device real 0m0.212s user 0m0.050s sys 0m0.175s >> reading the vpd can take a very long time on some devices, causing > >If we were to put your patch in, we would need to read the VPD >_during each boot_, thereby slowing down the booting process noticeably. >Plus the additional risk of locking up during boot for misbehaving >PCI devices. Probably not something we should be doing. > >I would rather have it delegated to some helper function/program >invoked from udev; with my latest patchset we always will have >well-behaved VPD information so it's easy to just read the vpd >attribute from sysfs. >There still might be a lag, but surely not so long as if to timeout >udev. And if we still encounter these devices I would mark them as >broken via the blacklist and skip VPD reading for those. > >Cheers, > >Hannes >-- >Dr. Hannes Reinecke Teamlead Storage & Networking >hare@xxxxxxx +49 911 74053 688 >SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg >GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton >HRB 21284 (AG Nürnberg) >-- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html