RE: [EXT] Re: [PATCH v2 net-next 1/2] bnx2x: Utilize firmware 7.13.21.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Paul,

> -----Original Message-----
> From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> Sent: Monday, March 14, 2022 8:37 PM
> To: Manish Chopra <manishc@xxxxxxxxxxx>
> Cc: Donald Buczek <buczek@xxxxxxxxxxxxx>; Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>;
> netdev@xxxxxxxxxxxxxxx; Ariel Elior <aelior@xxxxxxxxxxx>; Alok Prasad
> <palok@xxxxxxxxxxx>; Prabhakar Kushwaha <pkushwaha@xxxxxxxxxxx>;
> David S. Miller <davem@xxxxxxxxxxxxx>; Greg KH
> <gregkh@xxxxxxxxxxxxxxxxxxx>; stable@xxxxxxxxxxxxxxx;
> it+netdev@xxxxxxxxxxxxx; regressions@xxxxxxxxxxxxxxx
> Subject: Re: [EXT] Re: [PATCH v2 net-next 1/2] bnx2x: Utilize firmware
> 7.13.21.0
> 
> [Use Jakub’s current address]
> 
> Dear Manish,
> 
> 
> Am 14.03.22 um 15:36 schrieb Donald Buczek:
> 
> > On 3/11/22 1:11 PM, Manish Chopra wrote:
> >>> -----Original Message-----
> >>> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> >>> Sent: Thursday, March 10, 2022 3:48 AM
> 
> […]
> 
> >>> On Wed, Mar 9, 2022 at 11:46 AM Manish Chopra wrote:
> >>>>
> >>>> This has not changed anything functionally from driver/device
> >>>> perspective,
> >>> FW is still being loaded only when device is opened.
> >>>> bnx2x_init_firmware() [I guess, perhaps the name is misleading]
> >>>> just
> >>> request_firmware() to prepare the metadata to be used when device
> >>> will be opened.
> >>>
> >>> So how do you explain the report by Paul Menzel that things used to
> >>> work and no longer work now?
> >>>
> >>
> >> The issue which Paul mentioned had to do with "/lib/firmware/bnx2x/*
> >> file not found" when driver probes, which was introduced by the patch
> >> in subject, And the commit e13ad1443684 ("bnx2x: fix driver load from
> >> initrd") fixes this issue. So things should work as it is with the
> >> mentioned fixed commit.
> >> The only discussion led by this problem now is why the
> >> request_firmware() was moved early on [from open() to probe()] by the
> >> patch in subject.
> >> I explained the intention to do this in my earlier emails and let me
> >> add more details below -
> >>
> >> Note that we have just moved request_firmware() logic, *not*
> >> something significant which has to do with actual FW loading or
> >> device initialization from the FW file data which could cause
> >> significant functional change for this device/driver, FW load/init
> >> part still stays in open flow.
> >>
> >> Before the patch in subject, driver used to only work with
> >> fixed/specific FW version file whose version was statically known to
> >> the driver function at probe() time to take some decision to fail the
> >> function probe early in the system if the function is supposed to run
> >> with a FW version which is not the same version loaded on the device
> >> by another PF (different ENV).
> >> Now when we sent this new FW patch (in subject) then we got feedback
> >> from community to maintain backward compatibility with older FW
> >> versions as well and we did it in same v2 patch legitimately, just
> >> that now we can work with both older or newer FW file so we need this
> >> run time FW version information to cache (based on
> >> request_firmware() return success value for an old FW file or new FW
> >> file)
> >> which will be used in follow up probe() flows to decide the function
> >> probe failure early If there could be FW version mismatches against
> >> the loaded FW on the device by other PFs already
> >
> > There might be something more wrong with the patch in the subject: The
> > usability of the ports from a single card (with older firmware?) now
> > depends on the order the ports are enabled (first port enabled is
> > working, second port enabled is not working, driver complaining about
> > a firmware mismatch).
> >
> > In the following examples, the driver was not built-in to the kernel
> > but loaded from the root filesystem instead, so there is no initramfs
> > related problem here.
> >
> > For the records:
> >
> > root@ira:~# dmesg|grep bnx2x
> > [   18.749871] bnx2x 0000:45:00.0: msix capability found [
> > 18.766534] bnx2x 0000:45:00.0: part number
> > 394D4342-31373735-31314131-473331 [   18.799198] bnx2x 0000:45:00.0:
> > 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [
> > 18.807638] bnx2x 0000:45:00.1: msix capability found [   18.824509]
> > bnx2x 0000:45:00.1: part number 394D4342-31373735-31314131-473331
> [
> > 18.857171] bnx2x 0000:45:00.1: 32.000 Gb/s available PCIe bandwidth
> > (5.0 GT/s PCIe x8 link) [   18.865619] bnx2x 0000:46:00.0: msix
> > capability found [   18.882636] bnx2x 0000:46:00.0: part number
> > 394D4342-31373735-31314131-473331 [   18.915196] bnx2x 0000:46:00.0:
> > 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [
> > 18.923636] bnx2x 0000:46:00.1: msix capability found [   18.940505]
> > bnx2x 0000:46:00.1: part number 394D4342-31373735-31314131-473331
> [
> > 18.973167] bnx2x 0000:46:00.1: 32.000 Gb/s available PCIe bandwidth
> > (5.0 GT/s PCIe x8 link) [   46.480660] bnx2x 0000:45:00.0 net04:
> > renamed from eth4 [   46.494677] bnx2x 0000:45:00.1 net05: renamed
> > from eth5 [   46.508544] bnx2x 0000:46:00.0 net06: renamed from eth6
> [   46.524641] bnx2x 0000:46:00.1 net07: renamed from eth7 root@ira:~# ls
> /lib/firmware/bnx2x/
> > bnx2x-e1-6.0.34.0.fw   bnx2x-e1-7.13.1.0.fw   bnx2x-e1-7.8.2.0.fw
> > bnx2x-e1h-7.12.30.0.fw  bnx2x-e1h-7.8.19.0.fw  bnx2x-e2-
> 7.10.51.0.fw  bnx2x-e2-7.8.17.0.fw
> > bnx2x-e1-6.2.5.0.fw    bnx2x-e1-7.13.11.0.fw  bnx2x-e1h-6.0.34.0.fw
> > bnx2x-e1h-7.13.1.0.fw   bnx2x-e1h-7.8.2.0.fw   bnx2x-e2-
> 7.12.30.0.fw  bnx2x-e2-7.8.19.0.fw
> > bnx2x-e1-6.2.9.0.fw    bnx2x-e1-7.13.15.0.fw  bnx2x-e1h-6.2.5.0.fw
> > bnx2x-e1h-7.13.11.0.fw  bnx2x-e2-6.0.34.0.fw   bnx2x-e2-
> 7.13.1.0.fw   bnx2x-e2-7.8.2.0.fw
> > bnx2x-e1-7.0.20.0.fw   bnx2x-e1-7.13.21.0.fw  bnx2x-e1h-6.2.9.0.fw
> > bnx2x-e1h-7.13.15.0.fw  bnx2x-e2-6.2.5.0.fw    bnx2x-e2-7.13.11.0.fw
> > bnx2x-e1-7.0.23.0.fw   bnx2x-e1-7.2.16.0.fw   bnx2x-e1h-7.0.20.0.fw
> > bnx2x-e1h-7.13.21.0.fw  bnx2x-e2-6.2.9.0.fw    bnx2x-e2-7.13.15.0.fw
> > bnx2x-e1-7.0.29.0.fw   bnx2x-e1-7.2.51.0.fw   bnx2x-e1h-7.0.23.0.fw
> > bnx2x-e1h-7.2.16.0.fw   bnx2x-e2-7.0.20.0.fw   bnx2x-e2-7.13.21.0.fw
> > bnx2x-e1-7.10.51.0.fw  bnx2x-e1-7.8.17.0.fw   bnx2x-e1h-7.0.29.0.fw
> > bnx2x-e1h-7.2.51.0.fw   bnx2x-e2-7.0.23.0.fw   bnx2x-e2-7.2.16.0.fw
> > bnx2x-e1-7.12.30.0.fw  bnx2x-e1-7.8.19.0.fw   bnx2x-e1h-7.10.51.0.fw
> > bnx2x-e1h-7.8.17.0.fw   bnx2x-e2-7.0.29.0.fw   bnx2x-e2-7.2.51.0.fw
> >
> > Now with v5.10.95, the first kernel of the series which includes
> > fdcfabd0952d ("bnx2x: Utilize firmware 7.13.21.0") and later:
> >
> > root@ira:~# dmesg -w &
> > [...]
> > root@ira:~# ip link set net04 up
> > [   88.504536] bnx2x 0000:45:00.0 net04: using MSI-X  IRQs: sp 47
> > fp[0] 49 ... fp[7] 56 root@ira:~# ip link set net05 up [   90.825820]
> > bnx2x: [bnx2x_compare_fw_ver:2380(net05)]bnx2x with FW 120d07 was
> > already loaded which mismatches my 150d07 FW. Aborting RTNETLINK
> > answers: Device or resource busy root@ira:~# ip link set net04 down
> > root@ira:~# ip link set net05 down root@ira:~# ip link set net05 up [
> > 114.462448] bnx2x 0000:45:00.1 net05: using MSI-X  IRQs: sp 58  fp[0]
> > 60 ... fp[7] 67 root@ira:~# ip link set net04 up [  117.247763] bnx2x:
> > [bnx2x_compare_fw_ver:2380(net04)]bnx2x with FW 120d07 was already
> > loaded which mismatches my 150d07 FW. Aborting RTNETLINK answers:
> > Device or resource busy
> >
> > With v5.10.94, both ports work fine:
> >
> > root@ira:~# dmesg -w &
> > [...]
> > root@ira:~# ip link set net04 up
> > [  133.126647] bnx2x 0000:45:00.0 net04: using MSI-X  IRQs: sp 47
> > fp[0] 49 ... fp[7] 56 root@ira:~# ip link set net05 up [  136.215169]
> > bnx2x 0000:45:00.1 net05: using MSI-X  IRQs: sp 58  fp[0] 60 ... fp[7]
> > 67
> 
> One additional note, that it’s totally unclear to us, where FW version
> 120d07 in the error message comes from. It maps to 7.13.18.0, which is
> nowhere to be found and too new to be on the cards EEPROM, which should
> be from 2013 or so.
> 

I could reproduce the earlier issue (about FW file load failure you have reported) on my 5.14.x based kernel with driver built-in the kernel (CONFIG_BNX2X=y),
which was caused due to commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")

# dmesg -T | grep "Direct firmware"
[Wed Mar 16 14:11:25 2022] bnx2x 0000:13:00.0: Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
[Wed Mar 16 14:11:25 2022] bnx2x 0000:13:00.0: Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
[Wed Mar 16 14:11:25 2022] bnx2x 0000:13:00.1: Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
[Wed Mar 16 14:11:25 2022] bnx2x 0000:13:00.1: Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2

After I have re-installed the kernel with the fix (which I have sent to you and list as RFC today) applied and performed cold boot/power cycle
of the the server, Later...

# dmesg -T | grep "Direct firmware"

# ethtool -i ens3f0
driver: bnx2x
version: 5.14.0+
firmware-version: mbi 7.19.2 bc 7.16.5 phy 1.34
expansion-rom-version:
bus-info: 0000:13:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

(I have configured ethtool with "--disable-netlink" to allow setting these message level, otherwise you will run into the other issue you have reported about it)
# ./ethtool -s ens3f0 msglvl 0x0100000
# ./ethtool -s ens3f1 msglvl 0x0100000

# ifconfig ens3f0 up
# ifconfig ens3f1 up
# ifconfig ens3f0 down
# ifconfig ens3f1 down

# dmesg -T | grep fw
# ifconfig ens3f0 up
# ifconfig ens3f1 up

# dmesg -T | grep fw
[Wed Mar 16 15:19:13 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2270(ens3f0)]stats fw_stats_num 10, vf headroom 0, num_groups 1
[Wed Mar 16 15:19:13 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2302(ens3f0)]statistics request base address set to 1 a078e000
[Wed Mar 16 15:19:13 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2305(ens3f0)]statistics data base address set to 1 a078e110
[Wed Mar 16 15:19:16 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2270(ens3f1)]stats fw_stats_num 12, vf headroom 0, num_groups 1
[Wed Mar 16 15:19:16 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2302(ens3f1)]statistics request base address set to 1 d1b6000
[Wed Mar 16 15:19:16 2022] bnx2x: [bnx2x_alloc_fw_stats_mem:2305(ens3f1)]statistics data base address set to 1 d1b6110

I suggest you to re-install the kernel with that fix applied and test in your environment after cold boot/power cycle the server so that it starts with clean state of the devices
---------------------------------------------------

Regarding the odd/mismatched FW version you reported recently, we believe it has nothing to do with the patch in subject,
perhaps it's some residue from earlier OOB modules or PMDs ?, Note that they don't use/load the firmware from /lib/firmware/bnx2x
but they have whole firmware in-built within the module and the version you mentioned seems something from oob component.
That's why you are not able to locate them in /lib/firmware/bnx2x/

Please check/scan the system environment if there are any OOB modules/PMDs installed or running on the same adapter by any chance ?
Maybe perhaps re-installing the kernel with fix and making a cold boot/power cycle of the server will make this issue go away too.? 

Even after all these if it still report about the odd loaded firmware, it should be fine informatively as the driver fix now also relax
the strict FW versions comparisons (against already loaded_fw by any chance) to allow these close oob firmwares to be backward
compatible instead failing the device load abruptly.

BTW,

1. Can you please provide the complete system logs (/var/log/messages or dmesg -T) and other relevant info (like lspci, ip link show etc.) for any issues ? 
2. Does system has only these two NIC controllers with total 4 PCI functions (two to each) ? No any other PCI functions on any of these controllers used in some different environment ?

Thanks !










[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux