On 12/3/18 1:28 PM, Steve Wise wrote: > Hey, > > If I rmmod the net device that is being used by an rxe device, then the > rmmod process hangs unregistering the ib device. Some sort of deadlock. > > If I change rxe_notify() to defer the rxe_remove() call to the kernel > global workq thread, then this deadlock is avoided. But I'm not sure > why and if this isn't some rdma core bug? > > There are 3 threads that seem to be stuck in > rxe_query_port()/ib_get_ethspeed(), a few others stuck grabbing a mutex, > and then the rmmod thread that is stuck in flush_workqueue() during > ib_device unregistration. > > Anybody familiar with this issue? > > [ 739.922232] INFO: task kworker/9:1:113 blocked for more than 120 seconds. > [ 739.929794] Not tainted 4.20.0-rc1-newlink+ #192 > [ 739.935701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 739.944302] kworker/9:1 D 0 113 2 0x80000000 > [ 739.950568] Workqueue: ipv6_addrconf addrconf_verify_work > [ 739.956749] Call Trace: > [ 739.959977] ? __schedule+0x322/0x9c0 > [ 739.964415] ? __mutex_lock+0x343/0x7d0 > [ 739.969028] schedule+0x32/0x80 > [ 739.972933] schedule_preempt_disabled+0xa/0x10 > [ 739.978223] __mutex_lock+0x348/0x7d0 > [ 739.982639] ? __mutex_lock+0x130/0x7d0 > [ 739.987231] ? addrconf_verify_work+0xa/0x20 > [ 739.992264] addrconf_verify_work+0xa/0x20 > [ 739.997108] process_one_work+0x203/0x4a0 > [ 740.001857] ? process_one_work+0x17a/0x4a0 > [ 740.006784] worker_thread+0x46/0x3e0 > [ 740.011190] kthread+0xf8/0x130 > [ 740.015053] ? process_one_work+0x4a0/0x4a0 > [ 740.019951] ? kthread_stop+0x140/0x140 > [ 740.024489] ret_from_fork+0x35/0x40 > [ 740.028758] INFO: task kworker/13:1:117 blocked for more than 120 > seconds. > [ 740.036322] Not tainted 4.20.0-rc1-newlink+ #192 > [ 740.042147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 740.050663] kworker/13:1 D 0 117 2 0x80000000 > [ 740.056846] Workqueue: infiniband ib_cache_task [ib_core] > [ 740.062918] Call Trace: > [ 740.066025] ? __schedule+0x322/0x9c0 > [ 740.070339] ? __mutex_lock+0x343/0x7d0 > [ 740.074812] schedule+0x32/0x80 > [ 740.078576] schedule_preempt_disabled+0xa/0x10 > [ 740.083724] __mutex_lock+0x348/0x7d0 > [ 740.087992] ? __mutex_lock+0x130/0x7d0 > [ 740.092424] ? rxe_query_port+0x83/0xf0 [rdma_rxe] > [ 740.097805] rxe_query_port+0x83/0xf0 [rdma_rxe] > [ 740.103006] ib_query_port+0xc9/0x1a0 [ib_core] > [ 740.108106] ? _cond_resched+0x15/0x30 > [ 740.112420] ? kmem_cache_alloc_trace+0x196/0x200 > [ 740.117680] ib_cache_update.isra.13.part.14+0x5c/0x340 [ib_core] > [ 740.124322] ? process_one_work+0x17a/0x4a0 > [ 740.129053] ib_cache_task+0x16/0x20 [ib_core] > [ 740.134027] process_one_work+0x203/0x4a0 > [ 740.138567] ? process_one_work+0x17a/0x4a0 > [ 740.143266] worker_thread+0x46/0x3e0 > [ 740.147431] kthread+0xf8/0x130 > [ 740.151060] ? process_one_work+0x4a0/0x4a0 > [ 740.155734] ? kthread_stop+0x140/0x140 > [ 740.160054] ret_from_fork+0x35/0x40 > [ 740.164112] INFO: task kworker/7:1:120 blocked for more than 120 seconds. > [ 740.171382] Not tainted 4.20.0-rc1-newlink+ #192 > [ 740.177000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 740.185317] kworker/7:1 D 0 120 2 0x80000000 > [ 740.191286] Workqueue: infiniband ib_cache_task [ib_core] > [ 740.197168] Call Trace: > [ 740.200112] ? __schedule+0x322/0x9c0 > [ 740.204260] ? __mutex_lock+0x343/0x7d0 > [ 740.208591] schedule+0x32/0x80 > [ 740.212214] schedule_preempt_disabled+0xa/0x10 > [ 740.217227] __mutex_lock+0x348/0x7d0 > [ 740.221378] ? __mutex_lock+0x130/0x7d0 > [ 740.225707] ? ib_get_eth_speed+0x6e/0x150 [ib_core] > [ 740.231174] ib_get_eth_speed+0x6e/0x150 [ib_core] > [ 740.236459] ? update_curr+0x17e/0x250 > [ 740.240694] ? find_held_lock+0x2d/0x90 > [ 740.245021] rxe_query_port+0x98/0xf0 [rdma_rxe] > [ 740.250141] ib_query_port+0xc9/0x1a0 [ib_core] > [ 740.255170] ? _cond_resched+0x15/0x30 > [ 740.259416] ? kmem_cache_alloc_trace+0x196/0x200 > [ 740.264628] ib_cache_update.isra.13.part.14+0x5c/0x340 [ib_core] > [ 740.271215] ? process_one_work+0x17a/0x4a0 > [ 740.275893] ib_cache_task+0x16/0x20 [ib_core] > [ 740.280820] process_one_work+0x203/0x4a0 > [ 740.285300] ? process_one_work+0x17a/0x4a0 > [ 740.289952] worker_thread+0x46/0x3e0 > [ 740.294082] kthread+0xf8/0x130 > [ 740.297684] ? process_one_work+0x4a0/0x4a0 > [ 740.302334] ? kthread_stop+0x140/0x140 > [ 740.306634] ret_from_fork+0x35/0x40 > [ 740.310690] INFO: task kworker/7:2:15203 blocked for more than 120 > seconds. > [ 740.318125] Not tainted 4.20.0-rc1-newlink+ #192 > [ 740.323750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 740.332078] kworker/7:2 D 0 15203 2 0x80000080 > [ 740.338068] Workqueue: events linkwatch_event > [ 740.342923] Call Trace: > [ 740.345869] ? __schedule+0x322/0x9c0 > [ 740.350029] ? __mutex_lock+0x343/0x7d0 > [ 740.354360] schedule+0x32/0x80 > [ 740.357991] schedule_preempt_disabled+0xa/0x10 > [ 740.363012] __mutex_lock+0x348/0x7d0 > [ 740.367160] ? __mutex_lock+0x130/0x7d0 > [ 740.371478] ? linkwatch_event+0xa/0x30 > [ 740.375796] linkwatch_event+0xa/0x30 > [ 740.379946] process_one_work+0x203/0x4a0 > [ 740.384434] ? process_one_work+0x17a/0x4a0 > [ 740.389101] worker_thread+0x46/0x3e0 > [ 740.393250] kthread+0xf8/0x130 > [ 740.396865] ? process_one_work+0x4a0/0x4a0 > [ 740.401522] ? kthread_stop+0x140/0x140 > [ 740.405824] ret_from_fork+0x35/0x40 > [ 740.409854] INFO: task rmmod:15311 blocked for more than 120 seconds. > [ 740.416757] Not tainted 4.20.0-rc1-newlink+ #192 > [ 740.422349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 740.430649] rmmod D 0 15311 14253 0x00000080 > [ 740.436613] Call Trace: > [ 740.439538] ? __schedule+0x322/0x9c0 > [ 740.443682] ? ep_poll_callback+0x2b5/0x370 > [ 740.448340] schedule+0x32/0x80 > [ 740.451966] schedule_timeout+0x221/0x3d0 > [ 740.456450] ? find_held_lock+0x2d/0x90 > [ 740.460755] ? wait_for_completion+0x108/0x170 > [ 740.465670] wait_for_completion+0x110/0x170 > [ 740.470411] ? wake_up_q+0x70/0x70 > [ 740.474283] flush_workqueue+0x1a7/0x540 > [ 740.478677] ? flush_workqueue+0x76/0x540 > [ 740.483754] ib_cache_cleanup_one+0x41/0x180 [ib_core] > [ 740.489360] ? __mutex_unlock_slowpath+0x4b/0x290 > [ 740.494537] ib_unregister_device+0x126/0x1f0 [ib_core] > [ 740.500227] rxe_remove+0xe/0x30 [rdma_rxe] > [ 740.504887] rxe_notify+0x7f/0x130 [rdma_rxe] > [ 740.509715] notifier_call_chain+0x47/0x70 > [ 740.514283] rollback_registered_many+0x275/0x500 > [ 740.519462] rollback_registered+0x56/0x90 > [ 740.524032] unregister_netdevice_queue+0x7e/0x100 > [ 740.529298] unregister_netdev+0x18/0x20 > [ 740.533694] remove_one+0x1b9/0x1f0 [cxgb4] > [ 740.538347] pci_device_remove+0x3b/0xc0 > [ 740.542734] device_release_driver_internal+0x175/0x240 > [ 740.548420] driver_detach+0x3a/0x70 > [ 740.552446] bus_remove_driver+0x58/0xd0 > [ 740.556805] pci_unregister_driver+0x26/0xa0 > [ 740.561515] cxgb4_cleanup_module+0x28/0x61 [cxgb4] > [ 740.566835] __x64_sys_delete_module+0x1e2/0x240 > [ 740.571898] do_syscall_64+0x5b/0x190 > [ 740.576008] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 740.581507] RIP: 0033:0x7f02bc091397 > [ 740.585529] Code: Bad RIP value. > [ 740.589199] RSP: 002b:00007fffb8f9a458 EFLAGS: 00000202 ORIG_RAX: > 00000000000000b0 > [ 740.597231] RAX: ffffffffffffffda RBX: 000000000081e220 RCX: > 00007f02bc091397 > [ 740.604832] RDX: 00007f02bc105b20 RSI: 0000000000000800 RDI: > 000000000081e288 > [ 740.612438] RBP: 0000000000000000 R08: 00007f02bc35a060 R09: > 00007f02bc105b20 > [ 740.620052] R10: 00007fffb8f99ee0 R11: 0000000000000202 R12: > 00007fffb8f9c751 > [ 740.627666] R13: 0000000000000000 R14: 000000000081e220 R15: > 000000000081e010 > [ 740.635289] > Showing all locks held in the system: > [ 740.642434] 1 lock held by khungtaskd/97: > [ 740.646928] #0: 00000000354ce682 (rcu_read_lock){....}, at: > debug_show_all_locks+0xe/0x110 > [ 740.655793] 3 locks held by kworker/9:1/113: > [ 740.660561] #0: 000000001c09232d > ((wq_completion)"%s"("ipv6_addrconf")){....}, at: > process_one_work+0x17a/0x4a0 > [ 740.671261] #1: 00000000614c2ae5 ((addr_chk_work).work){....}, at: > process_one_work+0x17a/0x4a0 > [ 740.680565] #2: 0000000067405b77 (rtnl_mutex){....}, at: > addrconf_verify_work+0xa/0x20 > [ 740.689093] 3 locks held by kworker/13:1/117: > [ 740.693968] #0: 0000000093301381 > ((wq_completion)"infiniband"){....}, at: process_one_work+0x17a/0x4a0 > [ 740.703911] #1: 00000000fb2af2df > ((work_completion)(&work->work)#2){....}, at: process_one_work+0x17a/0x4a0 > [ 740.714307] #2: 00000000ff63eb3d (&rxe->usdev_lock){....}, at: > rxe_query_port+0x83/0xf0 [rdma_rxe] > [ 740.723924] 4 locks held by kworker/7:1/120: > [ 740.728767] #0: 0000000093301381 > ((wq_completion)"infiniband"){....}, at: process_one_work+0x17a/0x4a0 > [ 740.738758] #1: 00000000fdfc413f > ((work_completion)(&work->work)#2){....}, at: process_one_work+0x17a/0x4a0 > [ 740.749187] #2: 00000000ff63eb3d (&rxe->usdev_lock){....}, at: > rxe_query_port+0x83/0xf0 [rdma_rxe] > [ 740.758851] #3: 0000000067405b77 (rtnl_mutex){....}, at: > ib_get_eth_speed+0x6e/0x150 [ib_core] > [ 740.768182] 2 locks held by agetty/7388: > [ 740.772735] #0: 00000000a52f1433 (&tty->ldisc_sem){....}, at: > tty_ldisc_ref_wait+0x24/0x50 > [ 740.781728] #1: 00000000b21ff732 (&ldata->atomic_read_lock){....}, > at: n_tty_read+0xd8/0x8f0 > [ 740.790908] 3 locks held by kworker/7:2/15203: > [ 740.795999] #0: 00000000a7e503ae ((wq_completion)"events"){....}, > at: process_one_work+0x17a/0x4a0 > [ 740.805714] #1: 00000000d09d5f76 ((linkwatch_work).work){....}, at: > process_one_work+0x17a/0x4a0 > [ 740.815266] #2: 0000000067405b77 (rtnl_mutex){....}, at: > linkwatch_event+0xa/0x30 > [ 740.823521] 2 locks held by rmmod/15311: > [ 740.828133] #0: 00000000923b1412 (&dev->mutex){....}, at: > device_release_driver_internal+0x39/0x240 > [ 740.837966] #1: 0000000067405b77 (rtnl_mutex){....}, at: > unregister_netdev+0xe/0x20 > [ 740.846425] 2 locks held by bash/15323: > [ 740.850971] #0: 00000000be5388b9 (&tty->ldisc_sem){....}, at: > tty_ldisc_ref_wait+0x24/0x50 > [ 740.860054] #1: 0000000098aaed68 (&ldata->atomic_read_lock){....}, > at: n_tty_read+0xd8/0x8f0 > > [ 740.871555] ============================================= > ib_get_eth_speed() - tries to get rtnl lock addrconf_verify_work() - tries to get rtnl lock linkwatch_event() - tries to get rtnl lock When you try to remove the netdevice, I assume it calls: unregister_netdev() - which gets the rtnl lock then it sends NETREG_UNREGISTERING event, which I assume rxe tries to remove the ib device... and deadlock. RXE is really broken :) Mark