syzbot <syzbot+e9b1ff41aa6a7ebf9640@xxxxxxxxxxxxxxxxxxxxxxxxx> writes: > Hello, > > syzbot found the following issue on: > > HEAD commit: eb5e56d14912 Merge tag 'platform-drivers-x86-v6.11-2' of g.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=137edff9980000 > kernel config: https://syzkaller.appspot.com/x/.config?x=e8a2eef9745ade09 > dashboard link: https://syzkaller.appspot.com/bug?extid=e9b1ff41aa6a7ebf9640 > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/a6552acb8476/disk-eb5e56d1.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/5c0963cd33df/vmlinux-eb5e56d1.xz > kernel image: https://storage.googleapis.com/syzbot-assets/7ba7283f6380/bzImage-eb5e56d1.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+e9b1ff41aa6a7ebf9640@xxxxxxxxxxxxxxxxxxxxxxxxx > > INFO: task kworker/0:7:5284 blocked for more than 143 seconds. > Not tainted 6.11.0-rc2-syzkaller-00011-geb5e56d14912 #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:kworker/0:7 state:D stack:13232 pid:5284 tgid:5284 ppid:2 flags:0x00004000 > Workqueue: events request_firmware_work_func > Call Trace: > <TASK> > context_switch kernel/sched/core.c:5188 [inline] > __schedule+0x1800/0x4a60 kernel/sched/core.c:6529 > __schedule_loop kernel/sched/core.c:6606 [inline] > schedule+0x14b/0x320 kernel/sched/core.c:6621 > schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6678 > __mutex_lock_common kernel/locking/mutex.c:684 [inline] > __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752 > device_lock include/linux/device.h:1009 [inline] > ath9k_hif_usb_firmware_fail drivers/net/wireless/ath/ath9k/hif_usb.c:1163 [inline] > ath9k_hif_usb_firmware_cb+0x34a/0x4b0 > drivers/net/wireless/ath/ath9k/hif_usb.c:1296 Ugh. Okay, so ath9k_hif_usb_firmware_cb can recursively call another firmware request, and if that fails (because it runs out of firmware names to try), it will do a device_release_driver() from within the firmware callback. Which takes a lock, and seems to deadlock. It does seem odd to try to do an asynchronous driver release from within a callback like this, so I'm not surprised that it deadlocks, really. The question is whether this has ever worked - does anyone know? Also, ath9k_htc_probe_device() has wait_for_target logic that depends on speaking to the firmware; and it seems to tear everything down if that fails. So my immediate thought is that we could just get rid of the device_release_driver() from the firmware callback entirely, and just rely on that timeout to tear things down. However, I am not well-versed enough in the USB probe and device setup logic, so I am not sure if there is any reason that wouldn't be enough. Anyone with a better grip on these things care to chime in? :) -Toke