Hi Mathew, Forty Five <mathewegeorge@xxxxxxxxx> wrote: > > Ping-Ke Shih <pkshih@xxxxxxxxxxx> writes: > > > Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know > > if you have two virtual interfaces, which for STA and AP modes? (Please check > > this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if > > this is a factor causing crash. > > I use hostapd as part of a Wi-Fi hotspot setup for this laptop. I REALLY > wish I'd connected the dots earlier and realised that it could be > related to this issue. While running gbcbefbd032 (first bad commit), I > disabled all the components of my setup and the issue went away; then I > enabled them one by one until the issue emerged. I'll walk you through > the relevant details, and my observations during this process. > > I create a virtual interface for hostapd using this systemd unit: > > ``` > [Unit] > Requires=sys-subsystem-net-devices-wlo1.device > After=network.target > After=sys-subsystem-net-devices-wlo1.device > [Service] > Type=oneshot > ExecStart=/usr/bin/iw dev wlo1 interface add wlo1_ap type __ap addr "xx:xx:xx:xx:xx:xx" > ExecStart=/usr/bin/ip addr add 192.168.30.1/24 dev wlo1_ap > [Install] > WantedBy=multi-user.target > ``` > > I need the '__ap' type because my card doesn't support two interfaces in > managed mode; see [1] for details. > > [1] https://wiki.archlinux.org/title/Talk:Software_access_point#Two_interfaces_on_same_card > > Then I configure NetworkManager to ignore this interface. > > ``` > ;; in /etc/NetworkManager/conf.d/unmanaged.conf > [keyfile] > unmanaged-devices=interface-name:wlo1_ap > ``` > > Coming to hostapd - this is where it gets rather complicated. First off, > let me mention that when I enabled hostapd.service again, I started > seeing the 'phy0: resume with hardware scan still in progress' warnings, > which had gone away upto this point. > > Next - once I enabled hostapd.service, I was able to reproduce the > crashes. However, the dmesg in the crash log was different from what I > see when I have the rest of my setup enabled (I hadn't applied either > patch when this crash happened, and it's on b54846da4 because that's the > earliest bad commit in which I'm able to produce crash logs at all, as I > described in my original message): Your setup is very complicated, so I can't setup in my side easily, and haven't time to dig deeper. I feel there are more than one problems, so please help to do some experiments to narrow down scope. First problem is the culprit commit [1] that makes system frozen, and I still feel the patch [2] you have taken can fix it. Please use [1] as code base and apply patch [2] to see the result (#exp 1). The difference between without [1] and with [1] + [2] is the timing driver report scan abort completion to mac80211. And the last few logs you collected show that crash after long time from scanning abort. Second problem is WiFi firmware get abnormal during doing resume. The log looks like (partially): [ T562] rtw89_8852be 0000:02:00.0: R_AX_RPQ_RXBD_IDX =0x00000000 [ T562] rtw89_8852be 0000:02:00.0: R_AX_DBG_ERR_FLAG=0x00000000 [ T562] rtw89_8852be 0000:02:00.0: R_AX_LBC_WATCHDOG=0x00000081 [ T562] rtw89_8852be 0000:02:00.0: <--- [ T562] rtw89_8852be 0000:02:00.0: SER catches error: 0x5000 In my side, this is rare, and your last few logs seem not happen. Not sure if this is because of timing result from adding many logs. I would defer this problem for now. Third (unsure) problem could be introduced by commits between [1] and [3]. If first problem can be addressed by #exp 1, it could be possible to bisect the problem between [1] and [3]. Even if [1] is the only problem, revert the commit to see if it becomes good (#exp 2). Summary: o 5bbd9b249880 [3] (v6.10-rc5) | #exp 2: 5bbd9b249880 + [4] (revert [1]; I feel this would be bad). : : : o bcbefbd032df [1] ("wifi: rtw89: add wait/completion for abort scan") | #exp 1: bcbefbd032df + [2] (I think this will be good.) o 7e11a2966f51 (this commit is good) [1] bcbefbd032df ("wifi: rtw89: add wait/completion for abort scan") [2] fix scan abort https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@xxxxxxxxxxx/ [3] 5bbd9b249880 (v6.10-rc5; the top of tree you are tring) [4] attached revert patch of [1] Ping-Ke
Attachment:
0001-Revert-wifi-rtw89-add-wait-completion-for-abort-scan.patch
Description: 0001-Revert-wifi-rtw89-add-wait-completion-for-abort-scan.patch