petter@xxxxxxxxxx <petter@xxxxxxxxxx> wrote: > > I have seen some issues with the LM808 dongle (8821au). I'm running > 6.12.12 kernel with all missing rtw88 patches cherry-picked from latest > Linux-wireless main track. The dongle seems to be working fine most of > the time, when running traffic and load it, but sometimes during low > traffic/idle I can see below crash that loop around. Any good ideas what > is going on here? (running on armhf based platform) Can you try the latest kernel? > > BR Petter > > Issue 1: > ================ > > Feb 08 10:32:08 machine kernel: rtw_8821au 1-1:1.0: firmware failed to > leave lps state > Feb 08 10:32:08 machine kernel: > Feb 08 10:32:08 machine kernel: > ============================================ > Feb 08 10:32:08 machine kernel: WARNING: possible recursive locking > detected > Feb 08 10:32:08 machine kernel: 6.12.12-g8e187440f820 #0 Not tainted > Feb 08 10:32:08 machine kernel: > -------------------------------------------- > Feb 08 10:32:08 machine kernel: kworker/u4:4/25 is trying to acquire > lock: > Feb 08 10:32:08 machine kernel: c4d8f050 (&rtwdev->mutex){+.+.}-{3:3}, > at: rtw_leave_lps+0x1d4/0x208 [rtw88_core] > Feb 08 10:32:08 machine kernel: > but task is already holding lock: > Feb 08 10:32:08 machine kernel: c4d8f050 (&rtwdev->mutex){+.+.}-{3:3}, > at: rtw_watch_dog_work+0x44/0x2e8 [rtw88_core] > Feb 08 10:32:08 machine kernel: > other info that might help us debug this: > Feb 08 10:32:08 machine kernel: Possible unsafe locking scenario: > Feb 08 10:32:08 machine kernel: CPU0 > Feb 08 10:32:08 machine kernel: ---- > Feb 08 10:32:08 machine kernel: lock(&rtwdev->mutex); > Feb 08 10:32:08 machine kernel: lock(&rtwdev->mutex); > Feb 08 10:32:08 machine kernel: > *** DEADLOCK *** > Feb 08 10:32:08 machine kernel: May be due to missing lock nesting > notation > Feb 08 10:32:08 machine kernel: 3 locks held by kworker/u4:4/25: > Feb 08 10:32:08 machine kernel: #0: c4eb64b4 > ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0x1ac/0x71c > Feb 08 10:32:08 machine kernel: #1: f090df20 > ((work_completion)(&(&rtwdev->watch_dog_work)->work)){+.+.}-{0:0}, at: > process_one_work+0x1d8/0x71c > Feb 08 10:32:08 machine kernel: #2: c4d8f050 > (&rtwdev->mutex){+.+.}-{3:3}, at: rtw_watch_dog_work+0x44/0x2e8 > [rtw88_core] There is a mutex_lock(&rtwdev->mutex) at rtw_watch_dog_work() obviously, but I can't find rtw_leave_lps() tries to hold a lock. Could you use addr2line to address where rtw_leave_lps+0x1d4/0x208 locate? > > Issue 2: > ================ > > Feb 07 20:23:45 machine kernel: rtw_8821au 1-1:1.0: firmware failed to > leave lps state > Feb 07 20:23:46 machine kernel: rtw_8821au 1-1:1.0: failed to get tx > report from firmware > Feb 07 20:23:46 machine kernel: BUG: scheduling while atomic: > swapper/0/0/0x00000103 > Feb 07 20:23:46 machine kernel: INFO: lockdep is turned off. > Feb 07 20:23:46 machine kernel: Modules linked in: rtw88_8821au > rtw88_8821a rtw88_88xxa rtw88_usb rtw88_core mac80211 libarc4 cfg80211 > imx_sdma ip_tables x_tables > Feb 07 20:23:46 machine kernel: irq event stamp: 10781843 > Feb 07 20:23:46 machine kernel: hardirqs last enabled at (10781842): > [<c0100c58>] __irq_svc+0xb8/0xd4 > Feb 07 20:23:46 machine kernel: hardirqs last disabled at (10781843): > [<c0dcf33c>] _raw_spin_lock_irqsave+0x64/0x68 > Feb 07 20:23:46 machine kernel: softirqs last enabled at (10781808): > [<c012cbbc>] handle_softirqs+0x2b4/0x4a8 > Feb 07 20:23:46 machine kernel: softirqs last disabled at (10781821): > [<c012cf30>] __irq_exit_rcu+0x12c/0x198 > Feb 07 20:23:46 machine kernel: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not > tainted 6.12.12-g8e187440f820 #0 > Feb 07 20:23:46 machine kernel: Hardware name: Freescale i.MX6 SoloX > (Device Tree) > Feb 07 20:23:46 machine kernel: Call trace: > Feb 07 20:23:46 machine kernel: unwind_backtrace from > show_stack+0x10/0x14 > Feb 07 20:23:46 machine kernel: show_stack from > dump_stack_lvl+0x88/0xb8 > Feb 07 20:23:46 machine kernel: dump_stack_lvl from > __schedule_bug+0x64/0x84 > Feb 07 20:23:46 machine kernel: __schedule_bug from > __schedule+0x944/0xc70 > Feb 07 20:23:46 machine kernel: __schedule from schedule+0x50/0x130 > Feb 07 20:23:46 machine kernel: schedule from > schedule_preempt_disabled+0x1c/0x2c > Feb 07 20:23:46 machine kernel: schedule_preempt_disabled from > __mutex_lock+0x7d4/0x914 > Feb 07 20:23:46 machine kernel: __mutex_lock from > mutex_lock_nested+0x1c/0x24 > Feb 07 20:23:46 machine kernel: mutex_lock_nested from > rtw_tx_report_purge_timer+0x44/0x74 [rtw88_core] > Feb 07 20:23:46 machine kernel: rtw_tx_report_purge_timer [rtw88_core] > from call_timer_fn+0xb4/0x310 > Feb 07 20:23:46 machine kernel: call_timer_fn from > __run_timers+0x278/0x324 > Feb 07 20:23:46 machine kernel: __run_timers from > run_timer_base+0x4c/0x6c > Feb 07 20:23:46 machine kernel: run_timer_base from > run_timer_softirq+0x14/0x38 > Feb 07 20:23:46 machine kernel: run_timer_softirq from > handle_softirqs+0x160/0x4a8 > Feb 07 20:23:46 machine kernel: handle_softirqs from > __irq_exit_rcu+0x12c/0x198 > Feb 07 20:23:46 machine kernel: __irq_exit_rcu from irq_exit+0x8/0x28 > Feb 07 20:23:46 machine kernel: irq_exit from __irq_svc+0x90/0xd4 > Feb 07 20:23:46 machine kernel: Exception stack(0xc1401f48 to rtw_tx_report_purge_timer() is a timer handler in BH context, so sleeping is disallowed. I also can't find where it tries to hold a lock. Please help to point out where rtw_tx_report_purge_timer+0x44/0x74 is.