Hi Kurt Heiner just fixed a similar problem with Realtek driver. You might need the same fix here. Andrew On Mon, Apr 08, 2024 at 09:22:05PM +0200, Marek Marczykowski-Górecki wrote: > Hi, > > After updating to 6.9-rc2 I can no longer unbind device from the igc > driver. "echo" into "unbind" file hangs, and via sysrq "w" I get this > call trace: > > [ 84.553112] Call Trace: > [ 84.553118] <TASK> > [ 84.553123] __schedule+0x23b/0x5c0 > [ 84.553134] schedule+0x27/0xa0 > [ 84.553142] schedule_preempt_disabled+0x15/0x30 > [ 84.553152] __mutex_lock.constprop.0+0x34c/0x6a0 > [ 84.553165] unregister_netdevice_notifier+0x25/0xc0 > [ 84.553178] netdev_trig_deactivate+0x1e/0x60 [ledtrig_netdev] > [ 84.553195] led_trigger_set+0x105/0x340 > [ 84.553206] led_classdev_unregister+0x4a/0x110 > [ 84.553219] release_nodes+0x3d/0xb0 > [ 84.553229] devres_release_all+0x8c/0xc0 > [ 84.553238] device_del+0x27a/0x3f0 > [ 84.553248] unregister_netdevice_many_notify+0x46a/0x6a0 > [ 84.553260] unregister_netdevice_queue+0xf0/0x130 > [ 84.553271] unregister_netdev+0x1c/0x30 > [ 84.553280] igc_remove+0xe3/0x1d0 [igc] > [ 84.553298] pci_device_remove+0x3f/0xb0 > [ 84.553308] device_release_driver_internal+0x19f/0x200 > [ 84.553320] unbind_store+0xa1/0xb0 > [ 84.553329] kernfs_fop_write_iter+0x11f/0x200 > [ 84.553341] vfs_write+0x293/0x460 > [ 84.553351] ksys_write+0x6f/0xf0 > [ 84.553360] do_syscall_64+0x87/0x170 > [ 84.553368] ? syscall_exit_work+0xf3/0x120 > [ 84.553378] ? syscall_exit_to_user_mode+0x69/0x220 > [ 84.553389] ? do_syscall_64+0x96/0x170 > [ 84.553397] ? do_syscall_64+0x96/0x170 > [ 84.553404] ? do_syscall_64+0x96/0x170 > [ 84.553412] ? do_syscall_64+0x96/0x170 > [ 84.553420] ? __irq_exit_rcu+0x4b/0xb0 > [ 84.553429] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 84.553439] RIP: 0033:0x7b46ae7c5ee4 > [ 84.553446] RSP: 002b:00007ffe580c2dd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > [ 84.553460] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007b46ae7c5ee4 > [ 84.553474] RDX: 000000000000000d RSI: 00006458ac50b4b0 RDI: 0000000000000001 > [ 84.553487] RBP: 00007ffe580c2e00 R08: 0000000000000073 R09: 0000000000000001 > [ 84.553500] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d > [ 84.553514] R13: 00006458ac50b4b0 R14: 00007b46ae8965c0 R15: 00007b46ae893f20 > [ 84.553528] </TASK> > > It worked fine on 6.8.4. > > Similar issue happens on few other systems, including one with Realtek > RTL8111/8168/8411 device, so it may be not specific to the igc driver > but some common API (LED trigger?). The issue does not affect a system > with e1000e driver. > > Lockdep says: > > [ 18.589322] ====================================================== > [ 18.589329] WARNING: possible circular locking dependency detected > [ 18.589335] 6.9.0-rc2-1.qubes.fc32.x86_64 #378 Not tainted > [ 18.589340] ------------------------------------------------------ > [ 18.589347] prepare-suspend/1145 is trying to acquire lock: > [ 18.589352] ffff897494bc37b8 (&led_cdev->trigger_lock){+.+.}-{3:3}, at: led_classdev_unregister+0x32/0x110 > [ 18.589367] > [ 18.589367] but task is already holding lock: > [ 18.589373] ffffffffb034dfa8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20 > [ 18.589384] > [ 18.589384] which lock already depends on the new lock. > [ 18.589384] > [ 18.589391] > [ 18.589391] the existing dependency chain (in reverse order) is: > [ 18.589399] > [ 18.589399] -> #1 (rtnl_mutex){+.+.}-{3:3}: > [ 18.589407] __mutex_lock+0xb2/0xbd0 > [ 18.589413] set_device_name+0x2d/0x140 [ledtrig_netdev] > [ 18.589423] netdev_trig_activate+0x1a6/0x220 [ledtrig_netdev] > [ 18.589432] led_trigger_set+0x20f/0x340 > [ 18.589438] led_trigger_register+0x16d/0x1a0 > [ 18.589443] do_one_initcall+0x6f/0x3d0 > [ 18.589451] do_init_module+0x60/0x240 > [ 18.589459] init_module_from_file+0x86/0xc0 > [ 18.589465] idempotent_init_module+0x126/0x2c0 > [ 18.589471] __x64_sys_finit_module+0x5a/0xb0 > [ 18.589477] do_syscall_64+0x96/0x190 > [ 18.589482] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 18.589490] > [ 18.589490] -> #0 (&led_cdev->trigger_lock){+.+.}-{3:3}: > [ 18.589498] __lock_acquire+0x13e7/0x2180 > [ 18.589505] lock_acquire+0xd5/0x2f0 > [ 18.589510] down_write+0x2a/0xc0 > [ 18.589515] led_classdev_unregister+0x32/0x110 > [ 18.589522] devres_release_all+0xb5/0x110 > [ 18.589530] device_del+0x275/0x3f0 > [ 18.589535] unregister_netdevice_many_notify+0x5ba/0x870 > [ 18.589543] unregister_netdevice_queue+0xf3/0x130 > [ 18.589549] unregister_netdev+0x18/0x20 > [ 18.589555] igc_remove+0xe1/0x1c0 [igc] > [ 18.589566] pci_device_remove+0x3b/0xb0 > [ 18.589574] device_release_driver_internal+0x1a5/0x210 > [ 18.589581] unbind_store+0x9d/0xb0 > [ 18.589587] kernfs_fop_write_iter+0x15b/0x210 > [ 18.589595] vfs_write+0x2bd/0x560 > [ 18.589601] ksys_write+0x71/0xf0 > [ 18.589608] do_syscall_64+0x96/0x190 > [ 18.589614] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 18.589620] > [ 18.589620] other info that might help us debug this: > [ 18.589620] > [ 18.589628] Possible unsafe locking scenario: > [ 18.589628] > [ 18.589635] CPU0 CPU1 > [ 18.589640] ---- ---- > [ 18.589645] lock(rtnl_mutex); > [ 18.589650] lock(&led_cdev->trigger_lock); > [ 18.589657] lock(rtnl_mutex); > [ 18.589664] lock(&led_cdev->trigger_lock); > [ 18.589670] > [ 18.589670] *** DEADLOCK *** > [ 18.589670] > [ 18.589676] 4 locks held by prepare-suspend/1145: > [ 18.589682] #0: ffff8974873a7420 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x71/0xf0 > [ 18.589693] #1: ffff897495886288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210[ 18.589704] #2: ffff8974820991b0 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x39/0x210 > [ 18.589715] #3: ffffffffb034dfa8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20 > [ 18.589726] > [ 18.589726] stack backtrace: > [ 18.589731] CPU: 1 PID: 1145 Comm: prepare-suspend Not tainted 6.9.0-rc2-1.qubes.fc32.x86_64 #378 > [ 18.589741] Hardware name: Xen HVM domU, BIOS 4.17.3 03/12/2024 > [ 18.589748] Call Trace: > [ 18.589752] <TASK> > [ 18.589755] dump_stack_lvl+0x73/0xb0 > [ 18.589761] check_noncircular+0x148/0x160 > [ 18.589766] ? stack_trace_save+0x4a/0x70 > [ 18.589773] __lock_acquire+0x13e7/0x2180 > [ 18.589780] lock_acquire+0xd5/0x2f0 > [ 18.589786] ? led_classdev_unregister+0x32/0x110 > [ 18.589793] down_write+0x2a/0xc0 > [ 18.589798] ? led_classdev_unregister+0x32/0x110 > [ 18.589804] led_classdev_unregister+0x32/0x110 > [ 18.589811] devres_release_all+0xb5/0x110 > [ 18.589816] device_del+0x275/0x3f0 > [ 18.589821] unregister_netdevice_many_notify+0x5ba/0x870 > [ 18.589829] unregister_netdevice_queue+0xf3/0x130 > [ 18.589835] unregister_netdev+0x18/0x20 > [ 18.589840] igc_remove+0xe1/0x1c0 [igc] > [ 18.589850] pci_device_remove+0x3b/0xb0 > [ 18.589855] device_release_driver_internal+0x1a5/0x210 > [ 18.589861] unbind_store+0x9d/0xb0 > [ 18.589867] kernfs_fop_write_iter+0x15b/0x210 > [ 18.589874] vfs_write+0x2bd/0x560 > [ 18.589880] ksys_write+0x71/0xf0 > [ 18.589886] do_syscall_64+0x96/0x190 > [ 18.589891] ? find_held_lock+0x2b/0x80 > [ 18.589896] ? lock_release+0x143/0x2c0 > [ 18.589902] ? do_user_addr_fault+0x354/0x8a0 > [ 18.589909] ? exc_page_fault+0x126/0x260 > [ 18.589916] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 18.589922] RIP: 0033:0x76426194fee4 > [ 18.589927] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 85 74 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 > [ 18.589946] RSP: 002b:00007ffe69a0ca98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > [ 18.589955] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 000076426194fee4 > [ 18.589963] RDX: 000000000000000d RSI: 000058ae60024480 RDI: 0000000000000001 > [ 18.589971] RBP: 00007ffe69a0cac0 R08: 0000000000000000 R09: 0000000000000001 > [ 18.589979] R10: 0000000000000004 R11: 0000000000000202 R12: 000000000000000d > [ 18.589987] R13: 000058ae60024480 R14: 0000764261a205c0 R15: 0000764261a1df20 > [ 18.589997] </TASK> > > > This is happening in a HVM domain on Xen, with PCI passthrough of > relevant devices, but I don't think it's related to the issue. > > There is some more details on > https://github.com/QubesOS/qubes-issues/issues/9096. > > > #regzbot introduced: v6.8.4..v6.9-rc2 > > -- > Best Regards, > Marek Marczykowski-Górecki > Invisible Things Lab