Harshitha Prem <quic_hprem@xxxxxxxxxxx> writes: > To support multi-link operation, multiple devices with different bands say > 2 GHz or 5 GHz or 6 GHz can be combined together as a group and provide > an abstraction to mac80211. > > Device group abstraction - when there are multiple devices that are > connected by any means of communication interface between them, then these > devices can be combined together as a single group using a group id to form > a group abstraction. In ath12k driver, this abstraction would be named as > ath12k_hw_group (ag). > > Please find below illustration of device group abstraction with two > devices. > > Grouping of multiple devices (in future) > +------------------------------------------------------------------------+ > | +-------------------------------------+ +-------------------+ | > | | +-----------+ | | +-----------+ | | +-----------+ | | > | | | ar (2GHz) | | | | ar (5GHz) | | | | ar (6GHz) | | | > | | +-----------+ | | +-----------+ | | +-----------+ | | > | | ath12k_base (ab) | | ath12k_base (ab) | | > | | (Dual band device) | | | | > | +-------------------------------------+ +-------------------+ | > | ath12k_hw_group (ag) based on group id | > +------------------------------------------------------------------------+ > > Say for example, device 1 has two radios (2 GHz and 5 GHz band) and > device 2 has one radio (6 GHz). > > In existing code - > device 1 will have two hardware abstractions hw1 (2 GHz) and hw2 > (5 GHz) will be registered separately to mac80211 as phy0 and phy1 > respectively. Similarly, device 2 will register its hw (6GHz) as > phy2 to mac80211. > > In future, with multi-link abstraction > > combination 1 - Different group id for device1 and device 2 > Device 1 will create a single hardware abstraction hw1 > (2 GHz and 5 GHz) and will be registered to mac80211 as > phy0. similarly, device 2 will register its hardware > (6 GHz) to mac80211 as phy1. > > combination 2 - Same group id for device1 and device 2 > Both device details are combined together as a group, say > group1, with single hardware abstraction of radios 2 GHz, > 5 GHz and 6 GHz band details and will be registered to > mac80211 as phy0. > > Add base infrastructure changes to add device grouping abstraction with > a single device. > > This patch series brings the base code changes with following order: > 1. Refactor existing code which would facilitate in introducing > device group abstraction. > 2. Create a device group abstraction during device probe. > 3. Start the device group only after QMI firmware ready event is > received for all the devices that are combined in the group. > 4. Move the hardware abstractions (ath12k_hw - ah) from device > (ath12k_base - ab) to device group abstraction (ag) as it would > ease in having different combinations of group abstraction that > can be registered to mac80211. > > > Depends-on: > [PATCH v2 0/3] wifi: ath12k: Refactor the hardware recovery > procedures. > Link - https://lore.kernel.org/ath12k/87a5ljt0p9.fsf@xxxxxxxxxx/T/ > > v2: > - Rebased to ToT > > Karthikeyan Periyasamy (8): > wifi: ath12k: Refactor core start api > wifi: ath12k: Add helpers to get or set ath12k_hw > wifi: ath12k: Add ath12k_get_num_hw api > wifi: ath12k: Introduce QMI firmware ready flag > wifi: ath12k: move ATH12K_FLAG_REGISTERED flag set to mac_register api > wifi: ath12k: Introduce device group abstraction > wifi: ath12k: refactor core start based on hardware group > wifi: ath12k: move ath12k_hw from per soc to group I see a deadlock warning in master-pending branch (tag ath-pending-202404291731) and based on manual bisect between patchsets it seems to come from this patchset. Do note that I didn't look at the patchset otherwise. Here's the warning I see during rmmod with WCN7850 and I see it every time: [ 147.211487] ====================================================== [ 147.211547] WARNING: possible circular locking dependency detected [ 147.211599] 6.9.0-rc5-wt-ath+ #1403 Not tainted [ 147.211646] ------------------------------------------------------ [ 147.211695] rmmod/1975 is trying to acquire lock: [ 147.211741] ffff888105c7f158 ((wq_completion)ath12k_qmi_driver_event){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x46/0xf0 [ 147.211815] #012[ 147.211815] but task is already holding lock: [ 147.211864] ffff88810db290a8 (&ag->mutex_lock){+.+.}-{3:3}, at: ath12k_core_hw_group_destroy.part.0+0x26/0x290 [ath12k] [ 147.212003] #012[ 147.212003] which lock already depends on the new lock.#012[ 147.212003] [ 147.212069] #012[ 147.212069] the existing dependency chain (in reverse order) is: [ 147.212135] #012[ 147.212135] -> #2 (&ag->mutex_lock){+.+.}-{3:3}: [ 147.212189] __lock_acquire+0xd43/0x1dd0 [ 147.212238] lock_acquire+0x1b0/0x560 [ 147.212280] __mutex_lock+0x154/0x1430 [ 147.212327] mutex_lock_nested+0x16/0x20 [ 147.212369] ath12k_core_qmi_firmware_ready+0x9d/0x400 [ath12k] [ 147.212436] ath12k_qmi_driver_event_work+0x4e9/0x6e0 [ath12k] [ 147.212507] process_one_work+0x8a4/0x1980 [ 147.212551] worker_thread+0x715/0x1270 [ 147.212594] kthread+0x2fa/0x3f0 [ 147.212636] ret_from_fork+0x31/0x70 [ 147.212679] ret_from_fork_asm+0x11/0x20 [ 147.212754] #012[ 147.212754] -> #1 ((work_completion)(&ab->qmi.event_work)){+.+.}-{0:0}: [ 147.212828] __lock_acquire+0xd43/0x1dd0 [ 147.212885] lock_acquire+0x1b0/0x560 [ 147.213000] process_one_work+0x82d/0x1980 [ 147.213055] worker_thread+0x715/0x1270 [ 147.213779] kthread+0x2fa/0x3f0 [ 147.214529] ret_from_fork+0x31/0x70 [ 147.215291] ret_from_fork_asm+0x11/0x20 [ 147.216044] #012[ 147.216044] -> #0 ((wq_completion)ath12k_qmi_driver_event){+.+.}-{0:0}: [ 147.217433] check_prev_add+0x1bd/0x2330 [ 147.218174] validate_chain+0xf4e/0x1cf0 [ 147.218853] __lock_acquire+0xd43/0x1dd0 [ 147.219589] lock_acquire+0x1b0/0x560 [ 147.220324] touch_wq_lockdep_map+0x66/0xf0 [ 147.221055] __flush_workqueue+0xeb/0x1120 [ 147.221731] drain_workqueue+0xf5/0x320 [ 147.222460] destroy_workqueue+0xb2/0x920 [ 147.223187] ath12k_qmi_deinit_service+0x5a/0x1f0 [ath12k] [ 147.223876] ath12k_core_hw_group_destroy.part.0+0x1f8/0x290 [ath12k] [ 147.224606] ath12k_core_deinit+0x37/0x50 [ath12k] [ 147.225314] ath12k_pci_remove+0xad/0x1b0 [ath12k] [ 147.226640] pci_device_remove+0x9b/0x1b0 [ 147.227326] device_remove+0xbf/0x150 [ 147.227986] device_release_driver_internal+0x3c3/0x580 [ 147.228619] driver_detach+0xc4/0x190 [ 147.229290] bus_remove_driver+0x130/0x2a0 [ 147.229900] driver_unregister+0x68/0x90 [ 147.230576] pci_unregister_driver+0x24/0x240 [ 147.231240] ath12k_pci_exit+0x10/0x20 [ath12k] [ 147.231866] __do_sys_delete_module+0x32c/0x580 [ 147.232523] __x64_sys_delete_module+0x4f/0x70 [ 147.233168] x64_sys_call+0x51b/0x9e0 [ 147.233756] do_syscall_64+0x65/0x130 [ 147.234396] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 147.235023] #012[ 147.235023] other info that might help us debug this:#012[ 147.235023] [ 147.236688] Chain exists of:#012[ 147.236688] (wq_completion)ath12k_qmi_driver_event --> (work_completion)(&ab->qmi.event_work) --> &ag->mutex_lock#012[ 147.236688] [ 147.238414] Possible unsafe locking scenario:#012[ 147.238414] [ 147.239520] CPU0 CPU1 [ 147.240114] ---- ---- [ 147.240633] lock(&ag->mutex_lock); [ 147.241212] lock((work_completion)(&ab->qmi.event_work)); [ 147.241746] lock(&ag->mutex_lock); [ 147.242341] lock((wq_completion)ath12k_qmi_driver_event); [ 147.242876] #012[ 147.242876] *** DEADLOCK ***#012[ 147.242876] [ 147.244460] 2 locks held by rmmod/1975: [ 147.245040] #0: ffff88810d7191b8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x9d/0x580 [ 147.245597] #1: ffff88810db290a8 (&ag->mutex_lock){+.+.}-{3:3}, at: ath12k_core_hw_group_destroy.part.0+0x26/0x290 [ath12k] [ 147.246230] #012[ 147.246230] stack backtrace: [ 147.247338] CPU: 5 PID: 1975 Comm: rmmod Not tainted 6.9.0-rc5-wt-ath+ #1403 [ 147.247900] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 147.248545] Call Trace: [ 147.249181] <TASK> [ 147.249757] dump_stack_lvl+0x7d/0xe0 [ 147.250401] dump_stack+0x10/0x20 [ 147.251031] print_circular_bug+0x2e8/0x480 [ 147.251606] check_noncircular+0x2f2/0x3d0 [ 147.252235] ? print_circular_bug+0x480/0x480 [ 147.252808] ? validate_chain+0x15e/0x1cf0 [ 147.253435] ? __kasan_check_read+0x11/0x20 [ 147.254065] ? mark_lock+0xe6/0x1470 [ 147.254636] ? alloc_chain_hlocks+0x4cc/0x790 [ 147.255265] check_prev_add+0x1bd/0x2330 [ 147.255839] ? __kasan_check_read+0x11/0x20 [ 147.256474] validate_chain+0xf4e/0x1cf0 [ 147.257104] ? check_prev_add+0x2330/0x2330 [ 147.257673] __lock_acquire+0xd43/0x1dd0 [ 147.258303] lock_acquire+0x1b0/0x560 [ 147.258875] ? touch_wq_lockdep_map+0x46/0xf0 [ 147.259502] ? lock_sync+0x1a0/0x1a0 [ 147.260131] ? __lock_acquired+0x208/0x810 [ 147.260699] ? lockdep_init_map_type+0x1a3/0x850 [ 147.261326] ? lockdep_init_map_type+0x1a3/0x850 [ 147.261892] ? touch_wq_lockdep_map+0x46/0xf0 [ 147.262513] touch_wq_lockdep_map+0x66/0xf0 [ 147.263126] ? touch_wq_lockdep_map+0x46/0xf0 [ 147.263684] __flush_workqueue+0xeb/0x1120 [ 147.264301] ? drain_workqueue+0xae/0x320 [ 147.264862] ? drain_workqueue+0xae/0x320 [ 147.265471] ? __this_cpu_preempt_check+0x13/0x20 [ 147.266086] ? wq_update_node_max_active+0x540/0x540 [ 147.266640] ? destroy_workqueue+0xaa/0x920 [ 147.267249] ? __this_cpu_preempt_check+0x13/0x20 [ 147.267806] ? bit_wait_timeout+0x160/0x160 [ 147.268420] ? __kasan_check_write+0x14/0x20 [ 147.269031] drain_workqueue+0xf5/0x320 [ 147.269586] destroy_workqueue+0xb2/0x920 [ 147.270202] ath12k_qmi_deinit_service+0x5a/0x1f0 [ath12k] [ 147.270785] ? debugfs_remove+0x52/0x60 [ 147.271400] ath12k_core_hw_group_destroy.part.0+0x1f8/0x290 [ath12k] [ 147.272046] ath12k_core_deinit+0x37/0x50 [ath12k] [ 147.272628] ath12k_pci_remove+0xad/0x1b0 [ath12k] [ 147.273271] pci_device_remove+0x9b/0x1b0 [ 147.273829] device_remove+0xbf/0x150 [ 147.274446] device_release_driver_internal+0x3c3/0x580 [ 147.275052] ? __kasan_check_read+0x11/0x20 [ 147.275592] driver_detach+0xc4/0x190 [ 147.276189] bus_remove_driver+0x130/0x2a0 [ 147.276728] driver_unregister+0x68/0x90 [ 147.277325] pci_unregister_driver+0x24/0x240 [ 147.277876] ? find_module_all+0x13e/0x1e0 [ 147.278469] ath12k_pci_exit+0x10/0x20 [ath12k] [ 147.279087] __do_sys_delete_module+0x32c/0x580 [ 147.279626] ? __kasan_slab_free+0x102/0x170 [ 147.280209] ? module_flags+0x2f0/0x2f0 [ 147.280741] ? kmem_cache_free+0xed/0x3e0 [ 147.281334] ? __fput+0x40c/0xa60 [ 147.281870] ? __fput+0x40c/0xa60 [ 147.282462] ? debug_smp_processor_id+0x17/0x20 [ 147.283045] __x64_sys_delete_module+0x4f/0x70 [ 147.283558] x64_sys_call+0x51b/0x9e0 [ 147.284125] do_syscall_64+0x65/0x130 [ 147.284613] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 147.285159] RIP: 0033:0x7f2df2adcc8b [ 147.286954] Code: 73 01 c3 48 8b 0d 05 c2 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 c1 0c 00 f7 d8 64 89 01 48 [ 147.288033] RSP: 002b:00007ffdc52ca298 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 147.288563] RAX: ffffffffffffffda RBX: 000055d10d0237e0 RCX: 00007f2df2adcc8b [ 147.289155] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055d10d023848 [ 147.289691] RBP: 00007ffdc52ca2f8 R08: 0000000000000000 R09: 0000000000000000 [ 147.290292] R10: 00007f2df2b58ac0 R11: 0000000000000206 R12: 00007ffdc52ca4d0 [ 147.290830] R13: 00007ffdc52caebf R14: 000055d10d0222a0 R15: 000055d10d0237e0 [ 147.291444] </TASK> -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches