On 27/08/2019 16:10, Johannes Berg wrote: > On Tue, 2019-08-27 at 09:07 +0000, Brendan Jackman wrote: >> >> My understanding is that the only reason to explicitly disable encryption for >> EAPoL is to workaround the race conditions. > > Oh. Yes, maybe we wanted to be able to do something like > > 1) install key > 2) send EAPOL frame(s) > > in that order, but still send the frames unencrypted even though the key > is already available. > > But if we actually were to do that, would it break your scheme? > > johannes > I'm a little confused now so let me lay my scheme out really verbosely and maybe I will answer your question... The correct sequence for the initial key setup is: 1. Complete 4-way handshake 2. Install PTK The problem is that one of those processes takes place in the data plane and the other in the control plane. So we don't really know that part 1 is complete before we do part 2. So 4/4 can get encrypted with a key that the peer has not installed. As you mentioned basically all drivers (including ours) work around this by just specifically saying that the packets in part 1 are not encrypted even if there is a key (I'll call this NOCRYPT). So even if part 2 interferes and the key gets installed before 4/4 goes out, it won't get encrypted. So far so good. However now we get to rekeying. The ideal sequence would be: 1. Complete 4-way handshake, encrypting EAPoL frames with key 0, establishing key 1 2. Install key 1 Because we specify that key 0 is used to encrypt the handshake, even if installing key 1 races against the sending of those frames, the peer can still decapsulate them using key 0. But in reality we only get to use a single key (for <5.3), so the sequence is: 1. Complete 4-way handshake to establish key 0' 2. Install key 0' over the top of key 0 My problem boils down to the fact that if, in part 1, we encrypt the handshake frames using key 0 (this is what the spec tells us to do), then if we hit the installation race then 4/4 might get encrypted with 0'. I think up to this point we are already on the same page... So, what if we just don't encrypt the rekeying handshake frames either (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should reject them according to the spec. So let's say we just have the peer allow unencrypted frames during an RSNA as long as they are EAPoL. This is easier said than done (at least in my case - perhaps I am a special case here?) because the MAC layer whose job is to discard unencrypted frames has no business peering at the ethertype or whatever. Similarly for the part where we discard GCMP replays (unencrypted frames don't have a packet number at all!). So in our driver we use the NOCRYPT workaround for the initial handshake, but then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL frames during rekeying). But that means we are vulnerable to the race condition and indeed we see rekeying fail if there is heavy traffic on the link. So the idea is that by moving part 1. into the control plane to some degree (i.e. into nl80211) we can fully complete establishment of key 0', using handshake frames encrypted with key 0, before we install key 0' (by flushing the TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost handshake packets, neither for the initial nor for rekeying handshakes. _______________________________________________ Hostap mailing list Hostap@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/hostap