On 2023-09-21 21:29:27, Johannes Berg wrote: > On Thu, 2023-09-21 at 13:24 -0400, Antoine Beaupré wrote: >> Hi, >> >> I've found what I feel might be a regression between Linux 6.1 and >> 6.5. For other reasons, I upgraded the kernel on my Debian 12 >> ("bookworm", stale) laptop from the distribution 6.1.52 to the unstable >> ("sid") version, 6.5.3. >> >> After the upgrade, I started to notice stuttering in my audio player, I >> tracked it down and managed to correlate it with some kernel errors >> related to the iwlwifi driver. >> >> What's interesting is that this happens regardless of whether or not the >> NIC is connected to a network. In at least one of the traces, the >> computer was connected over a wire and wireless was not associated in >> Network Manager. > > This happens when scanning. Ah, that makes sense! >> Here's an example of the problem: >> >> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0. > > Can you give a few wpa_supplicant lines (there were some below) above > this? Just want to make sure it really is scanning on wlan0, not > something with P2P device. Interestingly, for the above fault, there's no wpa_supplicant line just *before*. There's this *after*: sep 21 09:33:14 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5 sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_AUTH_KEY_0: 0x80 sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0 sep 21 09:33:15 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=WORLD But an earlier one is preceeded by: sep 21 09:32:45 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5 sep 21 09:32:45 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0. [...] >> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x20103600 | ADVANCED_SYSASSERT > >> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000FF | umac data1 > > This means that somehow scan_start_mac_or_link_id in the driver ended up > 0xff which is invalid, but I'm not sure I see immediately how that > happened, since it looks like in 6.5.3 we do assign it reasonably. I > guess somehow in the code link_info->fw_link_id must be 0xff (invalid > ID), but I'm not sure I see how that could happen. > > *thinks* > > Oh.. This is an older firmware, so it doesn't have > IWL_UCODE_TLV_CAPA_MLD_API_SUPPORT! Hah. I feel like I had some concerns > in this area before ... but maybe the other way around. > > I think something like this, perhaps: > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c > @@ -2342,7 +2342,7 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm, > if (gen_flags & IWL_UMAC_SCAN_GEN_FLAGS_V2_FRAGMENTED_LMAC2) > gp->num_of_fragments[SCAN_HB_LMAC_IDX] = IWL_SCAN_NUM_OF_FRAGS; > > - if (version < 12) { > + if (version < 12 || !iwl_mvm_has_mld_api(mvm->fw)) { > gp->scan_start_mac_or_link_id = scan_vif->id; > } else { > struct iwl_mvm_vif_link_info *link_info; Interesting! In any case, the firmware is certainly out of date in Debian stable, and I guess it's to be expected that having it out of sync with the running kernel is a Bad Idea, it's just not something I've thought of before. :) Thanks for the debugging, I'll make sure to keep the firmware and kernel in better lockstep in the future! a. -- Lorsque l'on range des objets dans des tiroirs, et que l'on a plus d'objets que de tiroirs, alors un tiroir au moins contient deux objets. - Lejeune-Dirichlet, Peter Gustav