On 10/29/2024 8:29 AM, Kalle Valo wrote: > Jeff Johnson <quic_jjohnson@xxxxxxxxxxx> writes: > >> On 10/23/2024 6:29 AM, Kalle Valo wrote: >> >>> +static void ath12k_mac_station_post_remove(struct ath12k *ar, >>> + struct ath12k_link_vif *arvif, >>> + struct ath12k_link_sta *arsta) >>> +{ >>> + struct ieee80211_vif *vif = ath12k_ahvif_to_vif(arvif->ahvif); >>> + struct ieee80211_sta *sta = ath12k_ahsta_to_sta(arsta->ahsta); >>> + struct ath12k_sta *ahsta = arsta->ahsta; >>> + struct ath12k_peer *peer; >>> + >>> + lockdep_assert_wiphy(ath12k_ar_to_hw(ar)->wiphy); >>> + >>> + ath12k_mac_dec_num_stations(arvif, arsta); >>> + >>> + spin_lock_bh(&ar->ab->base_lock); >>> + >>> + peer = ath12k_peer_find(ar->ab, arvif->vdev_id, sta->addr); >>> + if (peer && peer->sta == sta) { >>> + ath12k_warn(ar->ab, "Found peer entry %pM n vdev %i after it was supposedly removed\n", >>> + vif->addr, arvif->vdev_id); >>> + peer->sta = NULL; >>> + list_del(&peer->list); >>> + kfree(peer); >>> + ar->num_peers--; >>> + } >>> + >>> + spin_unlock_bh(&ar->ab->base_lock); >>> + >>> + kfree(arsta->rx_stats); >>> + arsta->rx_stats = NULL; >>> + >>> + if (arsta->link_id < IEEE80211_MLD_MAX_NUM_LINKS) { >>> + rcu_assign_pointer(ahsta->link[arsta->link_id], NULL); >>> + synchronize_rcu(); >> >> I've mentioned this in the past in some internal discussion and seems now is a >> good time to bring this to light. >> >> It concerns me that this happens so late in the process. In theory another >> thread could already have a valid arsta pointer and could be trying to >> dereference that pointer while the code above is destroying underlying data >> (i.e. arsta->rx_stats). >> >> Should we set this to NULL and synchronize RCU at the beginning of the process >> so that we know all access to the struct has finished before we start >> destroying the data? >> >> Or can this not actually happen in practice due to other synchronization >> mechansims? And if so, should we document that somewhere? > > I think you are correct, AFAICS the kfree(arsta->rx_stats) should be > after synchronize_rcu(). But this race was already in the code before > this patch so we need to fix in a separate patch. I have added this to > my todo list. > Sounds reasonable to me