On Wed, 2023-08-16 at 23:04 +0200, Johannes Berg wrote: > Then I had to leave for a while so I only got to try it now, and indeed > I can reproduce it with a kernel built/booted that way, so I can make > changes for debugging. But wow, this is complicated, even creating the same interface names in different network namespaces ... This is what I got from debug: [ 49.036423][ C0] [ffff88801a9c1d00] wlan1: __ieee80211_beacon_update_cntdwn:5010: counter in ffff88801cd6d800 now 1 [ 49.040155][ C0] [ffff88801a9c1d00] wlan1: ieee80211_csa_finish:3589: queue csa_finalize_work That's fine, what it should be - although I don't see why there are 4ms between those two lines. [ 49.042665][ C1] [ffff88804ad0ba00] wlan1: __ieee80211_beacon_get:5415 unrelated wlan1 in a different network namespaces (the [pointer] is the network namespace pointer). I'll skip these for the rest of the log. [ 49.082269][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize_work:3732 [ 49.084809][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize_work:3740 [ 49.086646][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize_work:3744 [ 49.088336][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize:3717 [ 49.089932][ T11] [ffff88801a9c1d00] wlan1: __ieee80211_csa_finalize:3651 [ 49.091642][ T11] [ffff88801a9c1d00] wlan1: __ieee80211_csa_finalize:3670 [ 49.093661][ T11] [ffff88801a9c1d00] wlan1: __ieee80211_csa_finalize:3679 [ 49.097030][ T11] [ffff88801a9c1d00] wlan1: ieee80211_link_chanctx_reservation_complete:1211: queue csa_finalize_work That continues running as it should, but ... it took forever! By now, just to go through a few function calls, it took 57ms? [ 49.130990][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize_work:3732 [ 49.132892][ T11] [ffff88801a9c1d00] wlan1: ieee80211_csa_finalize_work:3740 and another 33ms to actually start the worker again [ 49.137404][ C0] [ffff88801a9c1d00] wlan1: __ieee80211_beacon_get:5415 [ 49.137462][ C1] [ffff88804ad09d00] wlan1: ieee80211_beacon_cntdwn_is_complete:5111 [ 49.138897][ C0] [ffff88801a9c1d00] wlan1: __ieee80211_beacon_update_cntdwn:5010: counter in ffff88801cd6d800 now 0 [ 49.139480][ C0] ------------[ cut here ]------------ [ 49.142567][ C0] WARNING: CPU: 0 PID: 5215 at net/mac80211/tx.c:5013 __ieee80211_beacon_get+0x1604/0x1a10 And the worker doesn't finish fast enough, so we get to 0, warn and crash. So what I said before about scheduling still seems like it could be the case. I'm not sure what we could do here - we can't delay the beacon, so if the update work didn't run ... and in general, I think we _do_ want this reported to see that something is broken, just that maybe with the single-core setup (two threads) and so much happening in the system, we don't get this running fast enough? Maybe we need to make our workqueue high priority? Testing ... no, doesn't even help. So not sure. Seems in reality this won't really happen since you have usually 100ms or so to execute the thing, and only a single (or handful maybe) interface(s). johannes