On Sun, Feb 22, 2015 at 05:41:25PM -0800, Linus Torvalds wrote:
Ok. Attached is what seems to be the relevant part of the
wpa_supplicant.log file.
The datestamp has been changed so that it can be matched up with the
dmesg, and I added empty lines for pauses when I was trying to figure
out what the heck it was doing, but other than that it's the raw log.
14:07:16.059389: nl80211: Authenticate (ifindex=3)
14:07:16.390659: nl80211: MLME event 37; timeout with 20:9f:db:e7:80:80
I'm assuming this is unrelated to the issue of getting disconnected
after 4-way handshake, but anyway, something here prevent the simple
Authentication frame exchange from completing (i.e., either the AP did
not reply to these these frames or the response was lost for some
reason). The following attempt did you go through without issues.
14:07:19.478850: nl80211: Authenticate (ifindex=3)
14:07:19.493911: nl80211: Authenticate event
14:07:19.494223: nl80211: Associate (ifindex=3)
14:07:19.503237: nl80211: Associate event
14:07:19.504682: wlp1s0: WPA: RX message 1 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2)
14:07:19.505850: wlp1s0: WPA: Sending EAPOL-Key 2/4
14:07:19.510108: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2)
14:07:19.510233: wlp1s0: WPA: Sending EAPOL-Key 4/4
So it looks like both the AP and the station are able to send and
receive EAPOL frames. EAPOL-Key message 1/4..3/4 worked fine. However,
I'm assuming msg 4/4 did not make it through for some reason.
14:07:19.510338: wlp1s0: WPA: Installing PTK to the driver
14:07:19.510405: wpa_driver_nl80211_set_key: ifindex=3 alg=3 addr=0x7f0ab41cad68 key_idx=0 set_tx=1 seq_len=6 key_len=16
14:07:19.510435: addr=20:9f:db:e7:80:80
And this is where IEEE 802.11 RSN gets a bit messy.. The station
installs a key for encrypting all unicast frames to the AP now that it
believes 4-way handshake has been completed successfully.
14:07:19.511009: wlp1s0: WPA: Installing GTK to the driver (keyidx=1 tx=0 len=16)
And same for a key to decrypt received broadcast/multicast frames.
14:07:19.511205: wlp1s0: WPA: Key negotiation completed with 20:9f:db:e7:80:80 [PTK=CCMP GTK=CCMP]
14:07:19.511223: wlp1s0: Cancelling authentication timeout
14:07:19.511238: wlp1s0: State: GROUP_HANDSHAKE -> COMPLETED
14:07:19.511253: wlp1s0: CTRL-EVENT-CONNECTED - Connection to 20:9f:db:e7:80:80 completed [id=0 id_str=]
As far as the station is concerned, everything looked fine until here
and connection was established successfully.
14:07:20.512476: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2)
However, the AP disagrees.. It looks like this specific AP uses a one
second timeout on EAPOL-Key timeouts. This EAPOL-Key frame was likely
unencrypted since the AP did not receive (or accept, but more likely not
receive) msg 4/4. We are currently allowing such unencrypted EAPOL
frames (but not other ethertypes) to get processed even when the
pairwise key has been configured.
14:07:20.512674: wlp1s0: WPA: Sending EAPOL-Key 4/4
And here's an attempt to reply.. Alas, this will likely go out encrypted
since we have the pairwise key configured and the AP will be dropping it
most likely since it would configure the pairwise key for this station
only after having receive valid 4/4.
14:07:20.513170: wpa_driver_nl80211_set_key: ifindex=3 alg=3 addr=0x7f0ab41cad68 key_idx=0 set_tx=1 seq_len=6 key_len=16
And station reconfigured the key..
14:07:21.513331: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2)
14:07:22.514320: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2)
These are the following two retries from AP, one second apart. Trying
EAPOL-Key frames four times in normal behavior, so based on timing here,
the issue is in AP not receiving (or accepting) msg 4/4.
14:07:23.542903: nl80211: Deauthenticate event
And this is the expected deauthentication by the AP after the last
EAPOL-Key frame retransmission failed to get a valid response.
So yes, this is a race condition of sorts. It is interesting if verbose
debugging in wpa_supplicant is enough to make this less likely to fail.
That will give some more time between the EAPOL-Key msg 4/4 transmission
(packet socket send() on the netdev) and pairwise key configuration
(netlink send). In theory, it could be possible to get even the first
EAPOL-Key msg 4/4 encrypted if the kernel operations for Data frame
transmission and nl80211 key configuration get reordered. Though, if
that were to be happening commonly, we would see significantly more
issues with this, so I'd assume this is not really the main issue here.
I'm not sure why the AP would not accept EAPOL-Key msg 4/4, so it would
be interesting to see a wireless capture log to confirm that the first
attempt of sending that out did indeed result in the frame going out and
doing so without encryption.
As far as making this issue less likely to cause connectivity issues, I
guess we could make mac80211 do a similar hack that some drivers use
here and force the EAPOL-Key msg 4/4 retransmissions to go out without
encryption even though the pairwise key was configured. I think it would
be nicer to control this from user space (i.e., wpa_supplicant could
indicate with the send command that the frame needs to go out without
encryption), but that is unfortunately not something that the current
mechanism used for sending out EAPOL frames in station mode supports.
mac80211 should be able to track this state if it were to look at which
EAPOL-Key frames has been exchanged and start encrypting them only after
the AP has transmitted an encrypted frame with the new pairwise key.
This is somewhat ugly, but doable. (Though, this would be yet another
justification for finally trying to come up with an alternative
mechanism for handling EAPOL frame TX/RX that is really a special case
for IEEE 802.1X.)