Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone. FYI: looks like this isn't a regression, as Stephane mentioned in a comment to the bko report: https://bugzilla.kernel.org/show_bug.cgi?id=215660#c13 > So today i could reproduce on 5.10.46 after a few days of testing. It does not look like a regression. Thus removing it from the regression tracking: #regzbot invalid: After further testing it does not look like a regression anymore Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 14.03.22 12:44, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker. > > I noticed a regression report in bugzilla.kernel.org that afaics nobody > acted upon since it was reported more than ten days ago (it afaifcs only > later became clear this is a regression), that's why I decided to > forward it to the lists and a few relevant people to the CC. To quote > from https://bugzilla.kernel.org/show_bug.cgi?id=215660: > >> Stephane Poignant 2022-03-04 17:24:49 UTC >> >> Created attachment 300529 [details] >> lspci and ethtool outputs on reproducing systems >> >> Context: >> - dense enterprise deployment, 10 lightweight aps (Aruba) on one office floor, up to 125 concurrent users total, up to 25 user per AP >> - the wireless network supports 802.11n, 802.11ac and 802.11ax in 5 GHz band >> - authentication is wpa2-psk >> - client devices consists in a variety of endpoints (laptops, cell phones, tablets, smart devices), running various versions of Mac OSX, Linux, Windows, Android or IOS. >> - certain clients supports only 20Mhz, HT protection kicks in and turns off on APs as those clients are moving around. Consequently ht_operation_mode fluctuates between 4 and 6 even when staying on the same AP. >> - the issue affects various laptops with Intel AX200 or AX201 chipsets, running Debian or Ubuntu with a recent kernel >= 5.10 >> - see attached file devices.txt for detailed information on the different laptops we have reproduced the issue on >> >> >> Steps to reproduce: >> - appears sometimes, but not always, after the iwlwifi STA roams from one AP to another >> - seen more often when ht_operation_mode changes between 4 and 6 (but not sufficient to trigger the issue) >> - STA deassociates from current AP and associates to the new one successfully >> - connectivity works on the new AP for a short period of time, usually between 30s and 1 minute >> - then suddenly, the Rx path breaks. No more received frame visible on the STA wireless interface. AP reports that frames are retransmitted and not acknowledged by STA. >> - the Tx path keeps working. Frames sent by STA to AP are received and visible on the network >> - in this state each inbound frame appears to trigger iwl_pcie_rx_handle_rb with cmd BAR_FRAME_RELEASE (seqnum is always the same): >> >> Mar 4 12:44:32 debian kernel: [15884.715812] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle Q 0: HW = 338, SW = 337 >> Mar 4 12:44:32 debian kernel: [15884.715819] iwlwifi 0000:00:14.3: iwl_pcie_get_rxb Got virtual RB ID 1348 >> Mar 4 12:44:32 debian kernel: [15884.715831] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle_rb Q 0: cmd at offset 0: BAR_FRAME_RELEASE (00.c2, seq 0xbfff) >> Mar 4 12:44:32 debian kernel: [15884.715838] iwlwifi 0000:00:14.3: iwl_mvm_release_frames_from_notif Frame release notification for BAID 14, NSSN 169 >> Mar 4 12:44:32 debian kernel: [15884.715843] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle_rb Q 0: RB end marker at offset 64 >> Mar 4 12:44:32 debian kernel: [15884.715852] iwlwifi 0000:00:14.3: iwl_pcie_restock_bd Assigned virtual RB ID 1348 to queue 0 index 334 >> >> - those events do not appear during normal operation (or very rarely) >> >> >> Temporary resolution: >> - in most cases, the STA remains in this state until Wifi is restarted or until it roams to another AP >> - while in that state, it may happens (rarely) that a few frame are received with very high latency, then the next ones are lost, for instance: >> >> [1646398334.114200] From 10.200.2.67 icmp_seq=148 Destination Host Unreachable >> [1646398334.114242] From 10.200.2.67 icmp_seq=149 Destination Host Unreachable >> [1646398334.114251] From 10.200.2.67 icmp_seq=150 Destination Host Unreachable >> [1646398336.365181] 64 bytes from 10.200.2.1: icmp_seq=151 ttl=64 time=2251 ms >> [1646398336.365237] 64 bytes from 10.200.2.1: icmp_seq=152 ttl=64 time=1227 ms >> [1646398336.365250] 64 bytes from 10.200.2.1: icmp_seq=153 ttl=64 time=203 ms >> [1646398375.042236] From 10.200.2.67 icmp_seq=188 Destination Host Unreachable >> [1646398375.042291] From 10.200.2.67 icmp_seq=189 Destination Host Unreachable >> [1646398375.042303] From 10.200.2.67 icmp_seq=190 Destination Host Unreachable >> >> >> Workaround: >> - disable_11ax=1 prevents the problem from happening >> [...] > >> Stephane Poignant 2022-03-10 14:48:39 UTC >> >> Did some further testing with vanilla kernel. >> 5.10.66 and older DO NOT reproduce the issue. >> 5.10.67 and newer DO reproduce. >> >> I see the following changes according to changelog: >> iwlwifi: mvm: Fix scan channel flags settings >> iwlwifi: fw: correctly limit to monitor dump >> iwlwifi: mvm: fix access to BSS elements >> iwlwifi: mvm: avoid static queue number aliasing >> iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed >> iwlwifi: pcie: free RBs during configure >> >> Suspecting the one related with queues but no strong opinion atm. >> >> [reply] [−] Comment 6 Stephane Poignant 2022-03-11 10:18:29 UTC >> >> Ok so after some further testing, turned out that after commenting the following lines in file drivers/net/wireless/intel/iwlwifi/pcie/trans.c: >> >> /* free all first - we might be reconfigured for a different size */ >> iwl_pcie_free_rbs_pool(trans); >> >> Which were introduced by the following commit: >> iwlwifi: pcie: free RBs during configure >> https://lore.kernel.org/all/iwlwifi.20210802170640.42d7c93279c4.I07f74e65aab0e3d965a81206fcb289dc92d74878@changeid/ >> >> Then i'm no longer able to reproduce. Tested in vanilla 5.10.67, vanilla 5.10.88 and 5.10.92 with Debian patches. >> > > Could somebody take a look into this? Or was this discussed somewhere > else already? Or even fixed? > > Anyway, to get this tracked: > > #regzbot introduced: 608c8359c567b4a04dedbe > #regzbot from: Stephane Poignant <stephane.poignant@xxxxxxxxx> > #regzbot title: wireless: iwlwifi: regression in 5.10.67 due to > "iwlwifi: pcie: free RBs during configure" > #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215660 > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of > reports on my table. I can only look briefly into most of them and lack > knowledge about most of the areas they concern. I thus unfortunately > will sometimes get things wrong or miss something important. I hope > that's not the case here; if you think it is, don't hesitate to tell me > in a public reply, it's in everyone's interest to set the public record > straight. >