Search Linux Wireless

Re: [PATCH v2] ath11k: free peer for station when disconnect from AP for QCA6390/WCN6855

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wen Gong <quic_wgong@xxxxxxxxxxx> writes:

> Commit b4a0f54156ac ("ath11k: move peer delete after vdev stop of station
> for QCA6390 and WCN6855") is to fix firmware crash by changing the WMI
> command sequence, but actually skip all the peer delete operation, then
> it lead commit 58595c9874c6 ("ath11k: Fixing dangling pointer issue upon
> peer delete failure") not take effect, and then happened a use-after-free
> warning from KASAN. because the peer->sta is not set to NULL and then used
> later.
>
> Change to only skip the WMI_PEER_DELETE_CMDID for QCA6390/WCN6855.
>
> log of user-after-free:
>
> [  534.888665] BUG: KASAN: use-after-free in ath11k_dp_rx_update_peer_stats+0x912/0xc10 [ath11k]
> [  534.888696] Read of size 8 at addr ffff8881396bb1b8 by task rtcwake/2860
>
> [  534.888705] CPU: 4 PID: 2860 Comm: rtcwake Kdump: loaded Tainted: G        W         5.15.0-wt-ath+ #523
> [  534.888712] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
> [  534.888716] Call Trace:
> [  534.888720]  <IRQ>
> [  534.888726]  dump_stack_lvl+0x57/0x7d
> [  534.888736]  print_address_description.constprop.0+0x1f/0x170
> [  534.888745]  ? ath11k_dp_rx_update_peer_stats+0x912/0xc10 [ath11k]
> [  534.888771]  kasan_report.cold+0x83/0xdf
> [  534.888783]  ? ath11k_dp_rx_update_peer_stats+0x912/0xc10 [ath11k]
> [  534.888810]  ath11k_dp_rx_update_peer_stats+0x912/0xc10 [ath11k]
> [  534.888840]  ath11k_dp_rx_process_mon_status+0x529/0xa70 [ath11k]
> [  534.888874]  ? ath11k_dp_rx_mon_status_bufs_replenish+0x3f0/0x3f0 [ath11k]
> [  534.888897]  ? check_prev_add+0x20f0/0x20f0
> [  534.888922]  ? __lock_acquire+0xb72/0x1870
> [  534.888937]  ? find_held_lock+0x33/0x110
> [  534.888954]  ath11k_dp_rx_process_mon_rings+0x297/0x520 [ath11k]
> [  534.888981]  ? rcu_read_unlock+0x40/0x40
> [  534.888990]  ? ath11k_dp_rx_pdev_alloc+0xd90/0xd90 [ath11k]
> [  534.889026]  ath11k_dp_service_mon_ring+0x67/0xe0 [ath11k]
> [  534.889053]  ? ath11k_dp_rx_process_mon_rings+0x520/0x520 [ath11k]
> [  534.889075]  call_timer_fn+0x167/0x4a0
> [  534.889084]  ? add_timer_on+0x3b0/0x3b0
> [  534.889103]  ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370
> [  534.889117]  __run_timers.part.0+0x539/0x8b0
> [  534.889123]  ? ath11k_dp_rx_process_mon_rings+0x520/0x520 [ath11k]
> [  534.889157]  ? call_timer_fn+0x4a0/0x4a0
> [  534.889164]  ? mark_lock_irq+0x1c30/0x1c30
> [  534.889173]  ? clockevents_program_event+0xdd/0x280
> [  534.889189]  ? mark_held_locks+0xa5/0xe0
> [  534.889203]  run_timer_softirq+0x97/0x180
> [  534.889213]  __do_softirq+0x276/0x86a
> [  534.889230]  __irq_exit_rcu+0x11c/0x180
> [  534.889238]  irq_exit_rcu+0x5/0x20
> [  534.889244]  sysvec_apic_timer_interrupt+0x8e/0xc0
> [  534.889251]  </IRQ>
> [  534.889254]  <TASK>
> [  534.889259]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  534.889265] RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70
> [  534.889271] Code: 74 24 10 e8 ea c2 bf fd 48 89 ef e8 12 53 c0 fd 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> 13 a7 b5 fd 65 8b 05 cc d9 9c 5e 85 c0 74 0a 5b 5d c3 e8 a0 ee
> [  534.889276] RSP: 0018:ffffc90002e5f880 EFLAGS: 00000206
> [  534.889284] RAX: 0000000000000006 RBX: 0000000000000200 RCX: ffffffff9f256f10
> [  534.889289] RDX: 0000000000000000 RSI: ffffffffa1c6e420 RDI: 0000000000000001
> [  534.889293] RBP: ffff8881095e6200 R08: 0000000000000001 R09: ffffffffa40d2b8f
> [  534.889298] R10: fffffbfff481a571 R11: 0000000000000001 R12: ffff8881095e6e68
> [  534.889302] R13: ffffc90002e5f908 R14: 0000000000000246 R15: 0000000000000000
> [  534.889316]  ? mark_lock+0xd0/0x14a0
> [  534.889332]  klist_next+0x1d4/0x450
> [  534.889340]  ? dpm_wait_for_subordinate+0x2d0/0x2d0
> [  534.889350]  device_for_each_child+0xa8/0x140
> [  534.889360]  ? device_remove_class_symlinks+0x1b0/0x1b0
> [  534.889370]  ? __lock_release+0x4bd/0x9f0
> [  534.889378]  ? dpm_suspend+0x26b/0x3f0
> [  534.889390]  dpm_wait_for_subordinate+0x82/0x2d0
> [  534.889400]  ? dpm_for_each_dev+0xa0/0xa0
> [  534.889410]  ? dpm_suspend+0x233/0x3f0
> [  534.889427]  __device_suspend+0xd4/0x10c0
> [  534.889440]  ? wait_for_completion_io+0x270/0x270
> [  534.889456]  ? async_suspend_late+0xe0/0xe0
> [  534.889463]  ? async_schedule_node_domain+0x468/0x640
> [  534.889482]  dpm_suspend+0x25a/0x3f0
> [  534.889491]  ? dpm_suspend_end+0x1a0/0x1a0
> [  534.889497]  ? ktime_get+0x214/0x2f0
> [  534.889502]  ? lockdep_hardirqs_on+0x79/0x100
> [  534.889509]  ? recalibrate_cpu_khz+0x10/0x10
> [  534.889516]  ? ktime_get+0x119/0x2f0
> [  534.889528]  dpm_suspend_start+0xab/0xc0
> [  534.889538]  suspend_devices_and_enter+0x1ca/0x350
> [  534.889546]  ? suspend_enter+0x850/0x850
> [  534.889566]  enter_state+0x27c/0x3d7
> [  534.889575]  pm_suspend.cold+0x42/0x189
> [  534.889583]  state_store+0xab/0x160
> [  534.889595]  ? sysfs_file_ops+0x160/0x160
> [  534.889601]  kernfs_fop_write_iter+0x2b5/0x450
> [  534.889615]  new_sync_write+0x36a/0x600
> [  534.889625]  ? new_sync_read+0x600/0x600
> [  534.889639]  ? rcu_read_unlock+0x40/0x40
> [  534.889668]  vfs_write+0x619/0x910
> [  534.889681]  ksys_write+0xf4/0x1d0
> [  534.889689]  ? __ia32_sys_read+0xa0/0xa0
> [  534.889699]  ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370
> [  534.889707]  ? syscall_enter_from_user_mode+0x1d/0x50
> [  534.889719]  do_syscall_64+0x3b/0x90
> [  534.889725]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  534.889731] RIP: 0033:0x7f0b9bc931e7
> [  534.889736] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [  534.889741] RSP: 002b:00007ffd9d34cc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  534.889749] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f0b9bc931e7
> [  534.889753] RDX: 0000000000000004 RSI: 0000561cd023c5f0 RDI: 0000000000000004
> [  534.889757] RBP: 0000561cd023c5f0 R08: 0000000000000000 R09: 0000000000000004
> [  534.889761] R10: 0000561ccef842a6 R11: 0000000000000246 R12: 0000000000000004
> [  534.889765] R13: 0000561cd0239590 R14: 00007f0b9bd6f4a0 R15: 00007f0b9bd6e8a0
> [  534.889789]  </TASK>
>
> [  534.889796] Allocated by task 2711:
> [  534.889800]  kasan_save_stack+0x1b/0x40
> [  534.889805]  __kasan_kmalloc+0x7c/0x90
> [  534.889810]  sta_info_alloc+0x98/0x1ef0 [mac80211]
> [  534.889874]  ieee80211_prep_connection+0x30b/0x11e0 [mac80211]
> [  534.889950]  ieee80211_mgd_auth+0x529/0xe00 [mac80211]
> [  534.890024]  cfg80211_mlme_auth+0x332/0x6f0 [cfg80211]
> [  534.890090]  nl80211_authenticate+0x839/0xcf0 [cfg80211]
> [  534.890147]  genl_family_rcv_msg_doit+0x1f4/0x2f0
> [  534.890154]  genl_rcv_msg+0x280/0x500
> [  534.890160]  netlink_rcv_skb+0x11c/0x340
> [  534.890165]  genl_rcv+0x1f/0x30
> [  534.890170]  netlink_unicast+0x42b/0x700
> [  534.890176]  netlink_sendmsg+0x71b/0xc60
> [  534.890181]  sock_sendmsg+0xdf/0x110
> [  534.890187]  ____sys_sendmsg+0x5c0/0x850
> [  534.890192]  ___sys_sendmsg+0xe4/0x160
> [  534.890197]  __sys_sendmsg+0xb2/0x140
> [  534.890202]  do_syscall_64+0x3b/0x90
> [  534.890207]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> [  534.890215] Freed by task 2825:
> [  534.890218]  kasan_save_stack+0x1b/0x40
> [  534.890223]  kasan_set_track+0x1c/0x30
> [  534.890227]  kasan_set_free_info+0x20/0x30
> [  534.890232]  __kasan_slab_free+0xce/0x100
> [  534.890237]  slab_free_freelist_hook+0xf0/0x1a0
> [  534.890242]  kfree+0xe5/0x370
> [  534.890248]  __sta_info_flush+0x333/0x4b0 [mac80211]
> [  534.890308]  ieee80211_set_disassoc+0x324/0xd20 [mac80211]
> [  534.890382]  ieee80211_mgd_deauth+0x537/0xee0 [mac80211]
> [  534.890472]  cfg80211_mlme_deauth+0x349/0x810 [cfg80211]
> [  534.890526]  cfg80211_mlme_down+0x1ce/0x270 [cfg80211]
> [  534.890578]  cfg80211_disconnect+0x4f5/0x7b0 [cfg80211]
> [  534.890631]  cfg80211_leave+0x24/0x40 [cfg80211]
> [  534.890677]  wiphy_suspend+0x23d/0x2f0 [cfg80211]
> [  534.890723]  dpm_run_callback+0xf4/0x1b0
> [  534.890728]  __device_suspend+0x648/0x10c0
> [  534.890733]  async_suspend+0x16/0xe0
> [  534.890737]  async_run_entry_fn+0x90/0x4f0
> [  534.890741]  process_one_work+0x866/0x1490
> [  534.890747]  worker_thread+0x596/0x1010
> [  534.890751]  kthread+0x35d/0x420
> [  534.890756]  ret_from_fork+0x22/0x30
>
> [  534.890763] The buggy address belongs to the object at ffff8881396ba000
>                 which belongs to the cache kmalloc-8k of size 8192
> [  534.890767] The buggy address is located 4536 bytes inside of
>                 8192-byte region [ffff8881396ba000, ffff8881396bc000)
> [  534.890772] The buggy address belongs to the page:
> [  534.890775] page:ffffea0004e5ae00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1396b8
> [  534.890780] head:ffffea0004e5ae00 order:3 compound_mapcount:0 compound_pincount:0
> [  534.890784] flags: 0x200000000010200(slab|head|node=0|zone=2)
> [  534.890791] raw: 0200000000010200 ffffea000562be08 ffffea0004b04c08 ffff88810004e340
> [  534.890795] raw: 0000000000000000 0000000000010001 00000001ffffffff 0000000000000000
> [  534.890798] page dumped because: kasan: bad access detected
>
> [  534.890804] Memory state around the buggy address:
> [  534.890807]  ffff8881396bb080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  534.890811]  ffff8881396bb100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  534.890814] >ffff8881396bb180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  534.890817]                                         ^
> [  534.890821]  ffff8881396bb200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  534.890824]  ffff8881396bb280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  534.890827] ==================================================================
> [  534.890830] Disabling lock debugging due to kernel taint
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
>
> Fixes: b4a0f54156ac ("ath11k: move peer delete after vdev stop of station for QCA6390 and WCN6855")
> Signed-off-by: Wen Gong <quic_wgong@xxxxxxxxxxx>
> ---
> v2:
>    1. rebased to ath.git ath-202112141538
>    2. remove label 'free' defined but not used
>    3. change warning "Found peer entry %pM n vdev %i after it was supposedly removed" to ath11k_dbg()

I still see unknown peer warnings during suspend:

[  506.782421] wlan0: authenticate with xx:xx:xx:xx:xx:xx
[  506.845984] wlan0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
[  506.852199] wlan0: authenticated
[  506.855886] wlan0: associate with xx:xx:xx:xx:xx:xx (try 1/3)
[  506.862157] wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x431 status=0 aid=2)
[  506.887866] wlan0: associated
[  507.603717] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  510.610907] PM: suspend entry (deep)
[  510.611871] Filesystems sync: 0.000 seconds
[  510.663217] Freezing user space processes ... (elapsed 0.003 seconds) done.
[  510.668909] OOM killer disabled.
[  510.670619] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  510.674552] printk: Suspending console(s) (use no_console_suspend to debug)
[  510.679606] wlan0: deauthenticating from xx:xx:xx:xx:xx:xx by local choice (Reason: 3=DEAUTH_LEAVING)
[  510.722483] e1000e: EEE TX LPI TIMER: 00000011
[  510.764835] ath11k_pci 0000:06:00.0: peer-unmap-event: unknown peer id 10
[  511.374486] ACPI: EC: interrupt blocked
[  511.440359] ACPI: PM: Preparing to enter system sleep state S3
[  511.473142] ACPI: EC: event blocked

> --- a/drivers/net/wireless/ath/ath11k/mac.c
> +++ b/drivers/net/wireless/ath/ath11k/mac.c
> @@ -4413,24 +4413,27 @@ static int ath11k_mac_op_sta_state(struct ieee80211_hw *hw,
>  		    new_state == IEEE80211_STA_NOTEXIST)) {
>  		ath11k_dp_peer_cleanup(ar, arvif->vdev_id, sta->addr);
>  
> -		if (ar->ab->hw_params.vdev_start_delay &&
> -		    vif->type == NL80211_IFTYPE_STATION)
> -			goto free;
> -
> -		ret = ath11k_peer_delete(ar, arvif->vdev_id, sta->addr);
> -		if (ret)
> -			ath11k_warn(ar->ab, "Failed to delete peer: %pM for VDEV: %d\n",
> -				    sta->addr, arvif->vdev_id);
> -		else
> -			ath11k_dbg(ar->ab, ATH11K_DBG_MAC, "Removed peer: %pM for VDEV: %d\n",
> -				   sta->addr, arvif->vdev_id);
> +		if (!(ar->ab->hw_params.vdev_start_delay &&
> +		      vif->type == NL80211_IFTYPE_STATION)) {
> +			ret = ath11k_peer_delete(ar, arvif->vdev_id, sta->addr);
> +			if (ret)
> +				ath11k_warn(ar->ab,
> +					    "Failed to delete peer: %pM for VDEV: %d\n",
> +					    sta->addr, arvif->vdev_id);
> +			else
> +				ath11k_dbg(ar->ab,
> +					   ATH11K_DBG_MAC,
> +					   "Removed peer: %pM for VDEV: %d\n",
> +					   sta->addr, arvif->vdev_id);
> +		}
>  
>  		ath11k_mac_dec_num_stations(arvif, sta);
>  		spin_lock_bh(&ar->ab->base_lock);
>  		peer = ath11k_peer_find(ar->ab, arvif->vdev_id, sta->addr);
>  		if (peer && peer->sta == sta) {
> -			ath11k_warn(ar->ab, "Found peer entry %pM n vdev %i after it was supposedly removed\n",
> -				    vif->addr, arvif->vdev_id);
> +			ath11k_dbg(ar->ab, ATH11K_DBG_MAC,
> +				   "Found peer entry %pM n vdev %i after it was supposedly removed\n",
> +				   vif->addr, arvif->vdev_id);

I'm not sure about changing this warning to a debug message, though I
don't have time to analyse this right now. But what if there's a race
condition somewhere still?

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux