Ping-Ke Shih <pkshih@xxxxxxxxxxx> writes: > Since I saw 'NetworkManager' and 'hostapd' in code trace, I would like to know > if you have two virtual interfaces, which for STA and AP modes? (Please check > this by 'iw dev') If so, is it possible to remove hostapd (AP mode) to see if > this is a factor causing crash. I use hostapd as part of a Wi-Fi hotspot setup for this laptop. I REALLY wish I'd connected the dots earlier and realised that it could be related to this issue. While running gbcbefbd032 (first bad commit), I disabled all the components of my setup and the issue went away; then I enabled them one by one until the issue emerged. I'll walk you through the relevant details, and my observations during this process. I create a virtual interface for hostapd using this systemd unit: ``` [Unit] Requires=sys-subsystem-net-devices-wlo1.device After=network.target After=sys-subsystem-net-devices-wlo1.device [Service] Type=oneshot ExecStart=/usr/bin/iw dev wlo1 interface add wlo1_ap type __ap addr "xx:xx:xx:xx:xx:xx" ExecStart=/usr/bin/ip addr add 192.168.30.1/24 dev wlo1_ap [Install] WantedBy=multi-user.target ``` I need the '__ap' type because my card doesn't support two interfaces in managed mode; see [1] for details. [1] https://wiki.archlinux.org/title/Talk:Software_access_point#Two_interfaces_on_same_card Then I configure NetworkManager to ignore this interface. ``` ;; in /etc/NetworkManager/conf.d/unmanaged.conf [keyfile] unmanaged-devices=interface-name:wlo1_ap ``` Coming to hostapd - this is where it gets rather complicated. First off, let me mention that when I enabled hostapd.service again, I started seeing the 'phy0: resume with hardware scan still in progress' warnings, which had gone away upto this point. Next - once I enabled hostapd.service, I was able to reproduce the crashes. However, the dmesg in the crash log was different from what I see when I have the rest of my setup enabled (I hadn't applied either patch when this crash happened, and it's on b54846da4 because that's the earliest bad commit in which I'm able to produce crash logs at all, as I described in my original message):
Attachment:
kdumpst-202406301627.zip
Description: Zip archive
Here are two more logs on 5bbd9b249880, again without either patch:
Attachment:
kdumpst-202406301810.zip
Description: Zip archive
Attachment:
kdumpst-202406301814.zip
Description: Zip archive
For completeness, here's a description of the remaining elements of my setup, but keep in mind that it's not necessary to reproduce the issue - only to explain how the logs have looked so far. hostapd cannot switch to a different channel while running; it has to be restarted on the new channel. I'm usually connected to WiFi, and constantly switching between stations depending on connectivity (NetworkManager does this automatically), which means that I'm constantly changing channels. So I have a script that runs `iw dev wlo1 info` every 2 seconds and greps the current channel number from its output (yes I know that `iw` has a warning not to scrape its output because it isn't considered stable; I don't know any other way to do this), and compares it to the channel number from its previous run of `iw`. It then restarts hostapd.service if they don't match, or stops/starts it if one of them is the empty string (meaning the interface had no channel number when `iw` was run). Finally - hostapd does not handle suspend+resume well. It stops working and spams 'handle_probe_req: send failed' into the logs, and it needs to be restarted. So I have a systemd service to automatically restart it on resume - ``` [Unit] After=suspend.target After=hibernate.target Description=restart hostapd after resume from suspend # ...because it stops working and spams the journal with # 'handle_probe_req: send failed' error [Service] Type=simple ExecStart=/usr/bin/systemctl restart hostapd.service [Install] WantedBy=suspend.target WantedBy=hibernate.target ``` I'll leave out the dnsmasq and iptables configuration I had to do, since I can't see how it could be related. > Attachment is a debug patch that add more messages and code trace, please help > to reproduce problem with patches of [2] and attachment. If your kernel enables > dynamic debug, need additional commands to have debug message: > sudo bash -c 'echo -n "module rtw89_core +p" > /sys/kernel/debug/dynamic_debug/control' > sudo bash -c 'echo -n "module rtw89_pci +p" > /sys/kernel/debug/dynamic_debug/control' > Since there are more than one symptoms causing system freeze, please collect > four logs as before. Also please give me two logs that system can normally > suspend/resume, so I can compare their difference. I applied both patches on the latest master; here are the crash logs. With the patches, I am no longer able to trigger the crash merely by suspending and resuming. I have to run `sudo systemctl restart hostapd.service` after hostapd emits the 'handle_probe_req: send failed' errors (which, as described above, happen after suspend+resume). So maybe [2] is making a difference here. I wanted to test with your debug patch and without [2], but the patch application failed unless I applied both together. [2] https://lore.kernel.org/linux-wireless/20240517013350.11278-1-pkshih@xxxxxxxxxxx/
Attachment:
kdumpst-202406301756.zip
Description: Zip archive
Attachment:
kdumpst-202406301800.zip
Description: Zip archive
Attachment:
kdumpst-202406301828.zip
Description: Zip archive
Attachment:
kdumpst-202406301830.zip
Description: Zip archive
Finally, here are two crash logs generated with Alt+SysRq+c, without hostapd enabled, and with both patches applied -
Attachment:
kdumpst-202406301855.zip
Description: Zip archive
Attachment:
kdumpst-202406301857.zip
Description: Zip archive