On Wed, 2023-01-18 at 10:39 -0800, James Prestwood wrote: > > iwlwifi 0000:00:14.3: Not associated and the session protection is over > already... > > So _something_ happens in the driver/firmware causing this event to be > generated, thats all I'm saying. > Yes. But that's also irrelevant. > But yes, it really doesn't matter why > but rather how mac80211 handles it. No, it doesn't matter how mac80211 handles it. The fact that the driver even tells mac80211 about it by calling the disconnect API is already a bug - just one that doesn't matter now because mac80211 ignores it. > > So I think you're just looking in the wrong place - the real question > > is > > why the association sequence in net/wireless/sme.c doesn't continue > > (or > > abort) at this point? > > I think I narrowed down this "why" pretty well. No, you haven't. > A connection loss event > happens between a successful authentication but before association. But it doesn't! There's no "connection loss" event, it's just the driver being confused. The flow is something like this: * authenticate -> driver prepare for authentication -> send auth frame -> get auth response * firmware waits for association but that never happens * driver prints a message and calls API but since there was never even an *attempt* to associate, that whole end of the "time event" and the message is irrelevant If there was an attempt to associate it wouldn't even matter if it was before or after the end of the time event, because if it already ended the driver would just create a new one. In fact, the whole reason we don't abort the time event after successful authentication and schedule a new one on association is just an optimisation - it's nicer to the firmware to just have one, and normally we finish auth + assoc within a few hundred ms. > Since __ieee80211_disconnect does not take this state into account the > kernel haults the reassociation and never informs userspace. No no - there's no "the kernel halts the reassociation" in this case. You're confusing cause and effect. The *reason* we get this message and the pointless/buggy/... call to __ieee80211_disconnect() is that there's never an attempt to associate. It's not *causing* the lack thereof. > Maybe my solution/fix is incorrect, but its at least a starting point. I don't see it that way - something is clearly broken in that there's no association attempt, but I still don't know what. All you've done is created a special iwlwifi fallback path to let userspace recover from it, not actually addressed the bug. > > No, you can't expect that, you could be authenticated with the AP for > > an > > indefinite amount of time, or never hear the deauth frame (if it ever > > sends one). > > Ok, so at least a CMD_CONNECT event right? > > Maybe I'm giving nl80211 too much credit, but the > CMD_AUTH/ASSOC/CONNECT APIs have always seemed symmetric in terms of > events, in that if you issue one of these commands you will get an > event in return. So I would expect CMD_CONNECT to generate a > CMD_CONNECT event. > Yes, I would also expect a CMD_CONNECT event, via nl80211_send_connect_result() / __cfg80211_connect_result(). But something is going wrong. I think we need to look into probably cfg80211_sme_rx_auth() - why is that not continuing the state machine? Surely wdev->conn didn't go away, so maybe for some reason in this case we already have wdev->conn->state == CFG80211_CONN_CONNECTED? But even in case of reassoc, wdev->conn is freshly allocated and should be zeroed, at least initially. But maybe some of the events mac80211 generates during the disconnect messes with the state? The other cases in cfg80211_sme_rx_auth() seem to generate an event already one way or the other. johannes