On Wed, Sep 11, 2019 at 11:26 AM Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote: > > Hi, > > > So I'm at LCA > > When did LCA move to Portugal? ;-)) Heh. I may be jetlagged and not thinking straight. LCA/LPC ;) > > Previous resume looks normal: > > [snip] > > wlp2s0: Limiting TX power to 23 (23 - 0) dBm as advertised by > > 54:ec:2f:05:70:2c > > Is that the message you meant? > > > I say _almost_, because I don't see the "No TX power limit" for the > > country lookup (yes, Portugal) this time? > > because here you had it too, just a bit earlier. It usually comes when a > beacon is received the first time, which depends on the AP timing. Duh. I'm blind and didn't notice, because I was expecting it in the same order. And I didn't think about it or double-check, because the errors that then followed later _looked_ like that TX power failing that I thought hadn't happened. > I don't _think_ any of the above would be a reason to disconnect, but it > clearly looks like the device got stuck at this point, since everything > just fails afterwards. Yeah, maybe the power stuff was just another effect of things getting stuck, rather than the reason for it getting stuck. So I probably mis-attributed the cause. > > WARNING: CPU: 4 PID: 1246 at net/mac80211/sta_info.c:1057 > > __sta_info_destroy_part2+0x147/0x150 [mac80211] > > Not really a surprise. Perhaps we shouldn't even WARN_ON() this, if the > driver is stuck completely and returning errors to everything that > doesn't help so much. > > Then again, the stack trace was helpful this time: > > > ieee80211_set_disassoc+0xc2/0x590 [mac80211] > > ieee80211_mgd_deauth.cold+0x4a/0x1b8 [mac80211] > > cfg80211_mlme_deauth+0xb3/0x1d0 [cfg80211] > > cfg80211_mlme_down+0x66/0x90 [cfg80211] > > cfg80211_disconnect+0x129/0x1e0 [cfg80211] > > cfg80211_leave+0x27/0x40 [cfg80211] > > cfg80211_netdev_notifier_call+0x1a7/0x4e0 [cfg80211] > > notifier_call_chain+0x4c/0x70 > > __dev_close_many+0x57/0x100 > > dev_close_many+0x8d/0x140 > > dev_close.part.0+0x44/0x70 > > cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211] > > cfg80211_rfkill_set_block+0x22/0x30 [cfg80211] > > rfkill_set_block+0x92/0x140 [rfkill] > > rfkill_fop_write+0x11f/0x1c0 [rfkill] > > vfs_write+0xb6/0x1a0 > > > Since we see that something actually did an rfkill operation. Did you > push a button there? No, I tried to turn off and turn on Wifi manually (no button, just the settings panel). I didn't notice the WARN_ON(), I just noticed that there was no networking, and "turn it off and on again" is obviously the first thing to try ;) > You don't happen to have timing information on these logs, perhaps > recorded in the logfile/journal? Sure. I cleaned up the logs to not spam people with lots of illegible data, but it's all in the journal log. Rough timeline: Sep 11 03:40:00 xps13 kernel: PM: suspend entry (s2idle) Sep 11 03:40:00 xps13 kernel: Filesystems sync: 0.028 seconds ... Sep 11 10:13:14 xps13 kernel: Restarting tasks ... done. Sep 11 10:13:14 xps13 kernel: PM: suspend exit Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: UART prints enabled Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: unsupported HTC service id: 1536 Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticate with 54:ec:2f:05:70:2c Sep 11 10:13:23 xps13 kernel: wlp2s0: send auth to 54:ec:2f:05:70:2c (try 1/3) Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticated Sep 11 10:13:23 xps13 kernel: wlp2s0: Limiting TX power to 23 (23 - 0) dBm as advertised by 54:ec:2f:05:70:2c ... Sep 11 10:13:23 xps13 kernel: ath: regdomain 0x826c dynamically updated by country element Sep 11 10:13:24 xps13 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: wmi command 16387 timeout, restarting hardware ... Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: failed to read hi_board_data address: -16 Sep 11 10:27:10 xps13 kernel: ath10k_pci 0000:02:00.0: failed to receive initialized event from target: 00000000 Sep 11 10:27:13 xps13 kernel: ath10k_pci 0000:02:00.0: failed to receive initialized event from target: 00000000 ... Sep 11 10:27:13 xps13 kernel: WARNING: CPU: 4 PID: 1246 at net/mac80211/sta_info.c:1057 __sta_info_destroy_part2+0x147/0x150 [mac80211] but if you want full logs I can send them in private to you. I do suspect it's atheros and suspend/resume or something. The wireless clearly worked for a while after the resume, but then at some point it stopped. > It seems odd to me, since the RTNL is acquired by > cfg80211_rfkill_set_block() and that doesn't even have an error path, it > just does > rtnl_lock(); > cfg80211_shutdown_all_interfaces(&rdev->wiphy); > rtnl_unlock(); > > The only explanation I therefore have is that something is just taking > *forever* in that code path, hence my question about timing information > on the logs. Yeah, maybe it would time out everything eventually. But not for a long time. It hadn't cleared up by Sep 11 10:36:21 xps13 gnome-session-f[6837]: gnome-session-failed: Fatal IO error 0 (Success) on X server :0. which is when I shut down the machine (and had to then force a hard power-off because the shutdown wanted things that needed the rtnl_lock to go away) Linus