Search Linux Wireless

Re: [PATCH v2] mac80211: Fix wlan freezes under load at rekey

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alexander,

On 07/11/2018 12:08 PM, Alexander Wetzel wrote:
Hi Denis,

Hi Alexander,

hostapd or wpa_supplicant are "ordering" mac80211 to install a new key
and are implementing the state machine and are in a good position to
handle the fallout... at least theoretically.

Ideally it would even know beforehand that we don't want to handle the
PTK rekeying, and then could reconnect instead of going through the
handshake.


Don't see how we could do that in the kernel, all the relevant
information is handled in the state machine. I guess an API extension
telling hostap/supplicant if we can handle rekeys or not would tbe he
only way to avoid that.


Can the kernel / driver provide some sort of hint to user space that PTK
rekey isn’t supported?  We could then have user space deauthenticate
with a big warning about what/why this is happening and try to
re-connect to the last used BSS.


Sure. In fact the latest patch is already doing that by returning an
error when set_key is called for PTK and it's not an initial call.
Tests with wpa_supplicant shows that this is is then handled like the
initial key set is failing. Networkmanager prompts for the password and
wpa_supplicant running without seems to blacklist a reconnect for 15s.

Ideally we shouldn't even get this far. We really need some kind of capability bit on the phy telling userspace whether PTK rekey is supported or not. Then userspace can take proper action based on this information. E.g. if PTK rekey isn't safe, then we can simply issue a CMD_DISCONNECT and re-connect to the last BSS. The kernel doesn't need to play any 'tricks'.

The fact that current userspace implementations are broken is regrettable and needs to be fixed.


I kind of liked that solution, but with existing implematations out this
is indeed awkward to find a "correct" solution.

The main problem for me currently is to find a correct and still
acceptable solution. This turned from "let's fix this nasty wlan
connection freezes" to a projet spanning the complete wlan stack: From
hardware up to and including the userspace...

Right. The problem is that this PTK rekey likely 'works' (for some definition thereof) in a vast majority of cases, e.g. the link isn't broken, so the user doesn't notice. So, if the kernel starts to unilaterally issue disconnects, you will have a lot of grumbling users.

Just to clarify, I'm not arguing against this necessarily. I can see why issuing a disconnect is a good idea for many reasons (e.g. security, etc.) But, I would expect a lot of user backlash if this is done, and given that this has been an issue for many years, I wonder if its the right way of handling this?


It's fun to learn how that interacts (if not very fast), I'm stuggling
finding the best way forward here. Whatever we do has undesired
consequences.
Maybe I'm missing something, but here the high level options we have in
my opinion:

1) Keep it as it is and solve that in a indefinite future when we and
the world implement the ieee802.11 2012 addition, to use key 0+1 for PTK
and 2+3 for GTK
- rekey has a extrem high probability of freezing connections and
leaking a few clear text packets for years (decades?) to come
+ The issue is fixed at the core

It would seem to me that 0+1 rekey is a separate issue that needs to be supported in both kernel and userspace anyway.


2) Make it worse, like some (most) Windows systems/cards seem to handle
it by encrypting EAPOL #4 with the NEW key, breaking the handshake and
forcing a reconnect.
- break something more to fix a problem sounds like a insane approach
+ This seems to be quite common and therefore well "tested" (based on my
very limited data on that)

This seems awful. And then if you're unlucky someone will come in and tell you that the kernel has to maintain this 'legacy' behavior forever. So things can't ever be fixed.

Plus, as I already mentioned above, some users 'think' that PTK rekey already works just fine.


3) Fix what we can in mac80211 but keep the API stable
- Without driver actions still many drivers will be "undefined" and even
if they are not freezing leak packets
+ This will reduce the problems to a fraction of what is is today with
only a mac80211 update

4) Redesign the mac80211 rekey handling and interaction with drivers to
only rekey if it is save and decline when not.
+ We only have to touch the kernel
- any supplicant (whatever runs the wpa state machine) may get errors
where the programmes did not expect them, leading to unexpected side
effects.

5) The full-stack solution: Update the API for the userpace
+ We do not have to "trick" the wpa state machine to disconnect, the
programmers of it have to code it.
- Well, it must be suppurted from the wpa state machine. If not we still
have to handle the rekey somehow or we accept freezes/cleartext leaks...

The last two solutions will also need some "fallback" when a secure
rekey is not possible and/or the user is runing an old state machine not
knowing about the new way...


My vote is for something along the lines of 5. I realize there are no good solutions here, but this really should be fixed by a combination of userspace and kernel working properly together. If this isn't possible, then perhaps the whole rekey should be done in the kernel and userspace should be given no chance to make a bad decision.


So I think we're probably better off accepting the set_key but not
actually using it, and instead disconnecting... even if that's awkward
and should come with a big comment :-)

Instead of returning an error I'll change the code to accept the rekey
but do nothing with it. (Basically delete the new key and keep the old
active).
And of course calling ieee80211_set_disassoc() prior to return "success".

Let's see how the supplicant will react on a disassoc while doing a
rekey...


This sounds pretty awful actually.  Now that wpa_s is not the only game
in town, can we stop resorting to these tactics?

Nothing of it is wpa_supplicat/hostap specifiy. Only my current "test"
environment is using it, simply due to the fact that I tracked the issue
down in that environment.
Everything besides ath9k as an AP running hostapd and a iwldvm card
running wpa_supplicant is mostly untested. And even there I have some
areas marked for follow up after we find a solution acceptable for the
kernel...

Do you have any other software you think I should add to my prelimitary
tests? If possible I'll happy to extend the test of the patches with those.


You can try:
https://git.kernel.org/pub/scm/network/wireless/iwd.git/

We would be happy to implement whatever 'proper' userspace behavior / kernel api that this discussion leads to.

Regards,
-Denis



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux