Re: [PATCH] wpa_supplicant: Send EAPoL-Key frames over NL80211 where available

Alexander Wetzel <alexander@xxxxxxxxxxxxxx> · Fri, 30 Aug 2019 23:42:36 +0200

Am 28.08.19 um 07:22 schrieb Brendan Jackman:

On 28/08/2019 01:50, Alexander Wetzel wrote:
  > On 8/27/19 2:49 PM, Brendan Jackman wrote:
  >>
  >> On 27/08/2019 19:11, Johannes Berg wrote:
  >>
  >>   >> So, what if we just don't encrypt the rekeying handshake frames either
  >>   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
  >>   >> reject them according to the spec. So let's say we just have the peer allow
  >>   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
  >>   >> than done (at least in my case - perhaps I am a special case here?) because the
  >>   >> MAC layer whose job is to discard unencrypted frames has no business peering at
  >>   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
  >>   >> (unencrypted frames don't have a packet number at all!).
  >>   >
  >>   > In mac80211, we specifically allow unencrypted EAPOL frames to be
  >>   > received, because also some WPA1 APs allow them, and WAPI requires this
  >>   > (IIRC), and for various other reasons. We have to look at the ethertype
  >>   > anyway at some point, so not sure I agree with that part of your
  >>   > statement.
  >>
  >
  > I've never seen unencrypted eapol frames with mac80211 or any other card. And
  > I don't think disabling encryption for eapol during rekey can work. It would
  > probably just cause even more issues when (trying) to rekey a connection.  I
  > see why accepting unencrypted eapol frames at the initial connect makes
  > sense. But for rekeying - regardless of the technical challenges - we would
  > need some new RSN feature flag and only do that when both support it.
  >
  >> When I say "has no business peering" I'm not just talking about a philosophical
  >> distaste for it - I actually mean in our current architecure the components that
  >> discard frames for RSNA reasons don't have physical access (or it would be
  >> costly to performance) to the ethernet header. So we'd basically need to change
  >> those components so that where currently they cause the frame to be discarded,
  >> they instead cause it to be passed on up and just set a flag somewhere to say
  >> "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
  >> "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
  >>
  >> I must admit: perhaps we'll need to do this in the end anyway for interop
  >> reasons. And perhaps this is a totally unique issue for our architecture which
  >> doesn't really justify changing generic code. But anyway perhaps you can see why
  >> I've been hesitant to do it.
  >>
  >>   > In general though, yes, I think you're right. We just allow them
  >>   > unencrypted for interoperability, but I'm not sure we actually transmit
  >>   > them unencrypted.
  >>   >
  >>   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
  >>   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
  >>   >> frames during rekeying). But that means we are vulnerable to the race condition
  >>   >> and indeed we see rekeying fail if there is heavy traffic on the link.
  >>   >>
  >
  > All this nowwcrypto talk may miss the point: This is a workaround working at
  > the initial connect but that was it. (Looks like 90% of the cards/drivers are
  > not looking behind the initial connect and mess up the rekey. So far I found
  > only two devices which can rekey correctly...) Assuming the driver is
  > supporting Extended Key ID but we can't use it we could achieve the same by
  > installing the key prior to sending out eapol #4 RX_ONLY and wait till the
  > frame is acknowledged to allow the key to be used for TX. But it has the same
  > issue as the no-encrypt idea: When we rekey it's not working.
  >
  > The PTK0 patch in mac80211 attempts to solve the issue in the following way:
  >
  > Normally there are next no buffers in the tx path. When you hand over a frame
  > to the kernel it's - according what I remember when investigating the issue -
  > an atomic operation till the skb reaches mac80211 or whatever other driver we
  > use. And the few "buffers" we have are fully driver controlled and we
  > basically can just make sure those are flushed when we replace the key. The
  > only potential issue I see are traffic shapers but then I think I eapol frames
  > bypass them already.  Of course using the control port for the frames would be
  > nicer. It should not be needed, through.
  >
  > So mac80211 just makes sure that once the key replacement code is running any
  > new frames handed over to it are dropped. The drivers then have to make sure
  > they flush all queues, making sure the eapol #4 frame is also send out prior
  > when it's instructed to delete an unicast key.
  >
  > The key deletion therefore serves as sync. When mac80211 set_key call returns
  > we know all frames queued to us prior to the call from user space to replace
  > the key have been send.
  >
  > The price of that are some dropped frames (which also could be queued) which
  > would likely have been caught by the key mismatch between the STAs anyhow.
  >
  > That same procedure should also work for you, or I must have a serious flaw in
  > my understanding. And has the additional benefit to also work correctly when
  > mac80211 generates the PN (IV) and not the card. Down sides are the dropped
  > frames and the needed queue flush.

This is interesting. To make sure I understand you: you are saying that my fear
of the .add_key call coming before the .ndo_start_xmit for frame 4/4 is
unfounded, 4/4 will always reach the driver before the .add_key call? So just
ensuring all frames queued before .add_key are sent before key
installation/replacement should fix the problem, without any need to change
userspace.

(I actually tried this a few months ago, and indeed I found that just flushing
   the TX rings in .add_key made rekeying work. But seeing it happen in my testing
   did not convince me that it would work in every situation)

Do you consider this is Linux net behaviour that can be relied upon or just
happenstance of today's implementation (maybe because of all the bufferbloat
work)? Perhaps this hostap patch is still not a bad idea, even if the current
kernel behaviour means that it is not strictly necessary?

I had a look at that when working on the mac80211 PTK0 rekey patch.
Now that was just a side quest besides many and I did not spend the time 
needed to work out the exact path(s)...

I'll trying to figure it out now but I'm next to sure that the eapol 
frames go through the qdisc. Bypassing that via the control path makes 
definitely sense for me.
As of today I expect to find out that the qdisc code is the only 
dangerous point which could queue (or drop!) the eapol frames prior to 
handing it over to the driver.

  >>   >> So the idea is that by moving part 1. into the control plane to some degree
  >>   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
  >>   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
  >>   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
  >>   >> handshake packets, neither for the initial nor for rekeying handshakes.
  >>   >
  >>   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
  >>   >
  >>   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
  >>   > flag set to keep the *initial* handshake working, no? Or actually fix
  >>   > the race conditions you pointed out above for it. While for the
  >>   > rekeying, well, it basically cannot work and you just move it to the
  >>   > control plane to have more control over it, which also makes sense.
  >>
  >> .tx_control_port + a TX ring flush in .add_key should fix both the initial and
  >> rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
  >> always go out before the key (re)installation takes place, which means:
  >>
  >> - The handshake frames for the initial key setup will all go out unencrypted
  >>     (because the MAC has no keys).
  >>
  >> - The handshake frames for rekeying will all go out encrypted with the old key,
  >>     before the new key gets installed. (If there is data flowing, some will
  >>     probably be lost at this point, but the EAPoL frames will be fine which is the
  >>     most important thing).
  >> > I should also say that with Alexander Wetzels' work on Extended Key
  > ID support
  >> going on as it is, my patch here would only end up really mattering for users of
  >> kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
  >> justifications are strong enough, I would understand.
  >
  > I think that's way too optimistic. Extended Key ID will not play any
  > meaningful role for probably decades, maybe even never. Extended Key ID
  > should have been the only way from the beginning, it's now quite late to force
  > everyone to switch.  At the moment this is only something which can work in a
  > careful controlled environment, regardless which kernel you are using. This
  > may be something to make mandatory in WPA4 or so. But since it will force many
  > vendors to redesign their chips I don't hold my breath...

Hmm, that's a good point, I hadn't thought of it this way because our device
supports multiple key IDs, and current use cases tend to be our devices talking
to each other.

To real "fix" - or as close as we can get to it - I have in mind has the 
following steps:

1) Extended Key ID as ultimate solution
2) PTK0 rekey flag in drivers able to rekey correctly, empowering
3) hostapd/wpa_supplicant to make a fast disconnect/reconnect instead of 
rekeying the connection when PTK0 rekey is not supported.

Of course hostapd/wpa_supplicant will also get a config setting to force 
reconnects for any rekeys, regardless if they are initiated locally or 
by the remote STA. Not sure what we want to use as the default setting 
for that, but first we have to get there:-)

I have most of the pieces in place now and even a POC patch doing 3)
It's just not fast the rest is working. (Probably I just have to prevent 
the BSS table to get flushed when reconnecting to a AP due to a rekey 
request.)

So in the end we have three group of cards. All able to rekey somehow. 
Some just better with no impact (Extended Key ID) or quite fast (PTK0 
rekey) or at least stable if with a comparable huge dropout (reconnect 
instead re-key).

Now there is not much we can do with cards/drivers not able to rekey 
PTK0 but still (trying) to do it. This would require a update to 802.11, 
forcing all drivers able to rekey PTK0 to also set a (RSN?) flag.
With such a flag either end of the connection could "override" a rekey 
attempt with a disconnect. Hopefully followed by a more or less fast 
reconnect. (And this may well be > 10s, if my tests with wpa_supplicant).

Alexander

_______________________________________________
Hostap mailing list
Hostap@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/hostap