Re: [PATCH] ath10k: prevent sta pointer rcu violation

Michal Kazior <michal.kazior@xxxxxxxxx> · Thu, 12 Jan 2017 17:18:57 +0100

On 12 January 2017 at 16:46, Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote:
> On Thu, 2017-01-12 at 16:14 +0100, Michal Kazior wrote:
>> Station pointers are RCU protected so driver must
>> be extra careful if it tries to store them
>> internally for later use outside of the RCU
>> section it obtained it in.
>>
>> It was possible for station teardown to race with
>> some htt events. The possible outcome could be a
>> use-after-free and a crash.
>>
>> Only peer-flow-control capable firmware was
>> affected (so hardware-wise qca99x0 and qca4019).
>>
>> This could be done in sta_state() itself via
>> explicit synchronize_net() call but there's
>> already a convenient sta_pre_rcu_remove() op that
>> can be hooked up to avoid extra rcu stall.
>
> I don't think this makes sense. You're not using RCU-protected pointers
> to the stations yourself, all accesses to them are locked under the
> &ar->data_lock. As a consequence, you can't have any need for waiting
> for a grace period. Since you also remove the pointer (under lock) when
> the station gets removed, I don't think RCU can be the problem?

Unless you then continue to use that sta pointer after you release
data_lock. Consider this:

>          CPU0             CPU1
> 1   synchronize_net()
> 2    drv_sta_state()
> 3                      htt_fetch_ind(pid,tid) called
> 4                      rcu_read_lock()
> 5                      get(data_lock)
> 6                      txq=peers[pid]->sta->txq[tid]
> 7                      put(data_lock)
> 8    get(data_lock)
> 9     peer->sta=0
> 10   put(data_lock)
> 11     kfree(sta)
> 12                     ieee80211_tx_dequeue(txq)

Even though there's no code like (9) per se you can think of it as
anything that tries to "remove" the peer--sta association (ath10k_peer
is removed implicitly via wmi peer delete command and waiting for htt
event completion).

Holding data_lock for the entire duration of handling fetch indication
isn't really good for performance so it's better to fix RCU handling.

Michał