Re: [PATCH] wpa_supplicant: Send EAPoL-Key frames over NL80211 where available

Brendan Jackman <brendan.jackman@xxxxxxxxxxxxxxx> · Wed, 28 Aug 2019 05:22:18 +0000

On 28/08/2019 01:50, Alexander Wetzel wrote:
 > On 8/27/19 2:49 PM, Brendan Jackman wrote:
 >>
 >> On 27/08/2019 19:11, Johannes Berg wrote:
 >>
 >>   >> So, what if we just don't encrypt the rekeying handshake frames either
 >>   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
 >>   >> reject them according to the spec. So let's say we just have the peer allow
 >>   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
 >>   >> than done (at least in my case - perhaps I am a special case here?) because the
 >>   >> MAC layer whose job is to discard unencrypted frames has no business peering at
 >>   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
 >>   >> (unencrypted frames don't have a packet number at all!).
 >>   >
 >>   > In mac80211, we specifically allow unencrypted EAPOL frames to be
 >>   > received, because also some WPA1 APs allow them, and WAPI requires this
 >>   > (IIRC), and for various other reasons. We have to look at the ethertype
 >>   > anyway at some point, so not sure I agree with that part of your
 >>   > statement.
 >>
 >
 > I've never seen unencrypted eapol frames with mac80211 or any other card. And
 > I don't think disabling encryption for eapol during rekey can work. It would
 > probably just cause even more issues when (trying) to rekey a connection.  I
 > see why accepting unencrypted eapol frames at the initial connect makes
 > sense. But for rekeying - regardless of the technical challenges - we would
 > need some new RSN feature flag and only do that when both support it.
 >
 >> When I say "has no business peering" I'm not just talking about a philosophical
 >> distaste for it - I actually mean in our current architecure the components that
 >> discard frames for RSNA reasons don't have physical access (or it would be
 >> costly to performance) to the ethernet header. So we'd basically need to change
 >> those components so that where currently they cause the frame to be discarded,
 >> they instead cause it to be passed on up and just set a flag somewhere to say
 >> "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
 >> "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
 >>
 >> I must admit: perhaps we'll need to do this in the end anyway for interop
 >> reasons. And perhaps this is a totally unique issue for our architecture which
 >> doesn't really justify changing generic code. But anyway perhaps you can see why
 >> I've been hesitant to do it.
 >>
 >>   > In general though, yes, I think you're right. We just allow them
 >>   > unencrypted for interoperability, but I'm not sure we actually transmit
 >>   > them unencrypted.
 >>   >
 >>   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
 >>   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
 >>   >> frames during rekeying). But that means we are vulnerable to the race condition
 >>   >> and indeed we see rekeying fail if there is heavy traffic on the link.
 >>   >>
 >
 > All this nowwcrypto talk may miss the point: This is a workaround working at
 > the initial connect but that was it. (Looks like 90% of the cards/drivers are
 > not looking behind the initial connect and mess up the rekey. So far I found
 > only two devices which can rekey correctly...) Assuming the driver is
 > supporting Extended Key ID but we can't use it we could achieve the same by
 > installing the key prior to sending out eapol #4 RX_ONLY and wait till the
 > frame is acknowledged to allow the key to be used for TX. But it has the same
 > issue as the no-encrypt idea: When we rekey it's not working.
 >
 > The PTK0 patch in mac80211 attempts to solve the issue in the following way:
 >
 > Normally there are next no buffers in the tx path. When you hand over a frame
 > to the kernel it's - according what I remember when investigating the issue -
 > an atomic operation till the skb reaches mac80211 or whatever other driver we
 > use. And the few "buffers" we have are fully driver controlled and we
 > basically can just make sure those are flushed when we replace the key. The
 > only potential issue I see are traffic shapers but then I think I eapol frames
 > bypass them already.  Of course using the control port for the frames would be
 > nicer. It should not be needed, through.
 >
 > So mac80211 just makes sure that once the key replacement code is running any
 > new frames handed over to it are dropped. The drivers then have to make sure
 > they flush all queues, making sure the eapol #4 frame is also send out prior
 > when it's instructed to delete an unicast key.
 >
 > The key deletion therefore serves as sync. When mac80211 set_key call returns
 > we know all frames queued to us prior to the call from user space to replace
 > the key have been send.
 >
 > The price of that are some dropped frames (which also could be queued) which
 > would likely have been caught by the key mismatch between the STAs anyhow.
 >
 > That same procedure should also work for you, or I must have a serious flaw in
 > my understanding. And has the additional benefit to also work correctly when
 > mac80211 generates the PN (IV) and not the card. Down sides are the dropped
 > frames and the needed queue flush.

This is interesting. To make sure I understand you: you are saying that my fear
of the .add_key call coming before the .ndo_start_xmit for frame 4/4 is
unfounded, 4/4 will always reach the driver before the .add_key call? So just
ensuring all frames queued before .add_key are sent before key
installation/replacement should fix the problem, without any need to change
userspace.

(I actually tried this a few months ago, and indeed I found that just flushing
  the TX rings in .add_key made rekeying work. But seeing it happen in my testing
  did not convince me that it would work in every situation)

Do you consider this is Linux net behaviour that can be relied upon or just
happenstance of today's implementation (maybe because of all the bufferbloat
work)? Perhaps this hostap patch is still not a bad idea, even if the current
kernel behaviour means that it is not strictly necessary?

 >>   >> So the idea is that by moving part 1. into the control plane to some degree
 >>   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
 >>   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
 >>   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
 >>   >> handshake packets, neither for the initial nor for rekeying handshakes.
 >>   >
 >>   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
 >>   >
 >>   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
 >>   > flag set to keep the *initial* handshake working, no? Or actually fix
 >>   > the race conditions you pointed out above for it. While for the
 >>   > rekeying, well, it basically cannot work and you just move it to the
 >>   > control plane to have more control over it, which also makes sense.
 >>
 >> .tx_control_port + a TX ring flush in .add_key should fix both the initial and
 >> rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
 >> always go out before the key (re)installation takes place, which means:
 >>
 >> - The handshake frames for the initial key setup will all go out unencrypted
 >>     (because the MAC has no keys).
 >>
 >> - The handshake frames for rekeying will all go out encrypted with the old key,
 >>     before the new key gets installed. (If there is data flowing, some will
 >>     probably be lost at this point, but the EAPoL frames will be fine which is the
 >>     most important thing).
 >> > I should also say that with Alexander Wetzels' work on Extended Key
 > ID support
 >> going on as it is, my patch here would only end up really mattering for users of
 >> kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
 >> justifications are strong enough, I would understand.
 >
 > I think that's way too optimistic. Extended Key ID will not play any
 > meaningful role for probably decades, maybe even never. Extended Key ID
 > should have been the only way from the beginning, it's now quite late to force
 > everyone to switch.  At the moment this is only something which can work in a
 > careful controlled environment, regardless which kernel you are using. This
 > may be something to make mandatory in WPA4 or so. But since it will force many
 > vendors to redesign their chips I don't hold my breath...

Hmm, that's a good point, I hadn't thought of it this way because our device
supports multiple key IDs, and current use cases tend to be our devices talking
to each other.

Thanks for the input,
Brendan

_______________________________________________
Hostap mailing list
Hostap@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/hostap