Search Linux Wireless

Re: [PATCH v2] mac80211: Fix wlan freezes under load at rekey

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Tue, 2018-05-15 at 12:22 +0200, Alexander Wetzel wrote:
>> Rekeying a pairwise key with encryption offload and only keyid 0 has two
>> potential races which can freeze the wlan conection till rekeyed again:
>>
>>  1) For incomming packets:
>>     If the local STA installs the key prior to the remote STA we still
>>     have the old key active in the hardware for a short time after
>>     mac80211 switched to the new key.
>>     The card can still hand over packets decoded with the old key to
>>     mac80211, bumping the new PN (IV) value to an incorrect high number and
>>     tricking the local replay detection to drop all packets really sent
>>     with the new key.
>>
>>  2) For outgoing packets:
>>     If mac80211 is providing the PN (IV) and hands over the cleartext
>>     packets for encryption to the hardware immediately prior to a key
>>     change the driver/card may process the queued packets after
>>     switching to the new key.
>>     This will immediatelly bump the PN (IV) value on the remote STA to
>>     an incorrect high number, also freezing the connection.
>>
>> Both issues can be prevented by first replacing the key in the HW and
>> makeing sure no aggregation sessions are running during the rekey.
> 
> Getting back to this, am I understanding correctly that in the latter
> (outgoing) case this would cause 

I don't get the point here...Shall I flesh out the description of the
exiting races in the code a bit more?

> 
> Also, I think you should probably describe better why the aggregation
> session stuff is needed. I'm already thinking there times about it again
> ...
I'll rework the description and will go into more details.
Here a first quick draft what I plan amend to the commit message. But
this got kind of long:
...
Both issues can be prevented by first replacing the key in the HW and
making sure no aggregation sessions are running during the rekey.

The "core" of the fix is to change the old unicast key install sequence
 - atomic switch over to the new key in mac80211
 - remove the old key in the HW and mac80211
 - add new key in the hw
to
 - mark the old key as tainted to drop TX packets trying to use it
 - remove the old key in the HW and mac80211
 - add new key to the HW
 - atomic switch over to the new key in mac80211

Since the new key is not marked as tainted the last step will also
enable TX again.

With the new procedure the HW will be unable to decode packets still
encrypted with the old key prior to switching to the new key in
mac80211. Locking down TX during the rekey makes sure that there are no
outgoing packets mac80211 prepared for the old key the hw can encrypt
with the new key.
We can now decrypt and get incoming packets while mac80211 is still on
the old key, but the worst (and most likely) case is now, that the
replay detection will drop those packets till mac80211 also switch over.

Instead of checking PN from old key packets against the new key (and
bump the PN for the new key, killing our connection) we are now checking
the PN from new key packets against the old key (and drop a few packets
during rekey only).

When using TX aggregation we get an additional race which can poison the
remote station:
When a TX aggregation session is running during the key replacement it's
possible that one of the frames send out prior of the TX lock down (old
key) got lost and will be repeated after the TX lock down is lifted,
using the old skb prepared for the old key with the new installed key in
the hardware. This would bump the PN counter of our peer to an incorrect
value and breaking the RX for the remote station.
Stopping all running aggregation sessions and declining any new ones
till we have switched over to the new key side steps that problem.

/end commit draft

It would be possible to keep the aggregation sessions running, but that
seems to need some quite invasive changes and checks.

> 
>> +               ieee80211_sta_tear_down_BA_sessions(
>> +                   sta, AGG_STOP_LOCAL_REQUEST);
> 
> minor indentation issue here
I'll go for a 81 char long single line here.

> 
>> +       ieee80211_flush_queues(key->local, key->sdata, false);
>> +       ieee80211_key_disable_hw_accel(key);
> 
> I'm not sure all drivers implement drv_flush() [correctly], what happens
> if they don't? I guess some packets end up being transmitted in clear
> text or a dummy key, unless the hardware/firmware knows about this and
> drops them?
Flushing the queues is not needed for to the logic and only a workaround
for drivers like ath9k which can still send out packets for a key which
was already deleted. When the driver guarantees that after the key
deletion function returns to mac80211 that there will be no more packets
send out prepared for the old key it can be removed.
The driver could simply wait for whatever the may queue time may be or
detect and drop these packets. (In fact it can even send out the packet
again, it just must make sure to still use the old and official deleted
key. May be handy for drivers like iwlwifi.)

> 
> Perhaps that means we should make this whole thing opt-in, and leave it
> up to driver authors to first validate that they handle the flushing
> correctly?

I tried to minimize the changes, but if we can consider an API change I
would ask the drivers to provide a new function to switch directly from
one key to another, without the need to delete it first. And to
guarantee there, that after the function returns no packets with
prepared for the old key can be sent out. Including retransmissions.

That way drivers like ath9k should be able to directly program the new
key and drivers for cards where this is not saving time can simply call
delete and add internally.

As fallback I would still want to use the current code with a flush,
well knowing that not all drivers implement it and only remove it once
all drivers have switched to the new API for rekeys.


> 
> 
> Regarding the code: the whole dance you do with ieee80211_key_link() and
> ieee80211_key_replace() seems to be a little pointless because you still
> add the key to debugfs and then free it, on errors that is?

I hope the more verbose description above explains that part.
The intend of the dance is to finalize the switch to the new key in the
hardware prior to doing it in mac80211. It's probably possible to bypass
more of the code, but I did not touch the logic besides what I needed,
understand or verified to be harmless.



Alexander



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux