Re: rxe hangs on netdev driver unload

Mark Bloch <markb@xxxxxxxxxxxx> · Mon, 3 Dec 2018 21:45:30 +0000

On 12/3/18 1:39 PM, Mark Bloch wrote:
> 
> On 12/3/18 1:28 PM, Steve Wise wrote:
>> Hey,
>>
>> If I rmmod the net device that is being used by an rxe device, then the
>> rmmod process hangs unregistering the ib device.  Some sort of deadlock. 
>>
>> If I change rxe_notify() to defer the rxe_remove() call to the kernel
>> global workq thread, then this deadlock is avoided.  But I'm not sure
>> why and if this isn't some rdma core bug? 
>>
>> There are 3 threads that seem to be stuck in
>> rxe_query_port()/ib_get_ethspeed(), a few others stuck grabbing a mutex,
>> and then the rmmod thread that is stuck in flush_workqueue() during
>> ib_device unregistration.
>>
>> Anybody familiar with this issue?  
>>
>> [  739.922232] INFO: task kworker/9:1:113 blocked for more than 120 seconds.
>> [  739.929794]       Not tainted 4.20.0-rc1-newlink+ #192
>> [  739.935701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  739.944302] kworker/9:1     D    0   113      2 0x80000000
>> [  739.950568] Workqueue: ipv6_addrconf addrconf_verify_work
>> [  739.956749] Call Trace:
>> [  739.959977]  ? __schedule+0x322/0x9c0
>> [  739.964415]  ? __mutex_lock+0x343/0x7d0
>> [  739.969028]  schedule+0x32/0x80
>> [  739.972933]  schedule_preempt_disabled+0xa/0x10
>> [  739.978223]  __mutex_lock+0x348/0x7d0
>> [  739.982639]  ? __mutex_lock+0x130/0x7d0
>> [  739.987231]  ? addrconf_verify_work+0xa/0x20
>> [  739.992264]  addrconf_verify_work+0xa/0x20
>> [  739.997108]  process_one_work+0x203/0x4a0
>> [  740.001857]  ? process_one_work+0x17a/0x4a0
>> [  740.006784]  worker_thread+0x46/0x3e0
>> [  740.011190]  kthread+0xf8/0x130
>> [  740.015053]  ? process_one_work+0x4a0/0x4a0
>> [  740.019951]  ? kthread_stop+0x140/0x140
>> [  740.024489]  ret_from_fork+0x35/0x40
>> [  740.028758] INFO: task kworker/13:1:117 blocked for more than 120
>> seconds.
>> [  740.036322]       Not tainted 4.20.0-rc1-newlink+ #192
>> [  740.042147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  740.050663] kworker/13:1    D    0   117      2 0x80000000
>> [  740.056846] Workqueue: infiniband ib_cache_task [ib_core]
>> [  740.062918] Call Trace:
>> [  740.066025]  ? __schedule+0x322/0x9c0
>> [  740.070339]  ? __mutex_lock+0x343/0x7d0
>> [  740.074812]  schedule+0x32/0x80
>> [  740.078576]  schedule_preempt_disabled+0xa/0x10
>> [  740.083724]  __mutex_lock+0x348/0x7d0
>> [  740.087992]  ? __mutex_lock+0x130/0x7d0
>> [  740.092424]  ? rxe_query_port+0x83/0xf0 [rdma_rxe]
>> [  740.097805]  rxe_query_port+0x83/0xf0 [rdma_rxe]
>> [  740.103006]  ib_query_port+0xc9/0x1a0 [ib_core]
>> [  740.108106]  ? _cond_resched+0x15/0x30
>> [  740.112420]  ? kmem_cache_alloc_trace+0x196/0x200
>> [  740.117680]  ib_cache_update.isra.13.part.14+0x5c/0x340 [ib_core]
>> [  740.124322]  ? process_one_work+0x17a/0x4a0
>> [  740.129053]  ib_cache_task+0x16/0x20 [ib_core]
>> [  740.134027]  process_one_work+0x203/0x4a0
>> [  740.138567]  ? process_one_work+0x17a/0x4a0
>> [  740.143266]  worker_thread+0x46/0x3e0
>> [  740.147431]  kthread+0xf8/0x130
>> [  740.151060]  ? process_one_work+0x4a0/0x4a0
>> [  740.155734]  ? kthread_stop+0x140/0x140
>> [  740.160054]  ret_from_fork+0x35/0x40
>> [  740.164112] INFO: task kworker/7:1:120 blocked for more than 120 seconds.
>> [  740.171382]       Not tainted 4.20.0-rc1-newlink+ #192
>> [  740.177000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  740.185317] kworker/7:1     D    0   120      2 0x80000000
>> [  740.191286] Workqueue: infiniband ib_cache_task [ib_core]
>> [  740.197168] Call Trace:
>> [  740.200112]  ? __schedule+0x322/0x9c0
>> [  740.204260]  ? __mutex_lock+0x343/0x7d0
>> [  740.208591]  schedule+0x32/0x80
>> [  740.212214]  schedule_preempt_disabled+0xa/0x10
>> [  740.217227]  __mutex_lock+0x348/0x7d0
>> [  740.221378]  ? __mutex_lock+0x130/0x7d0
>> [  740.225707]  ? ib_get_eth_speed+0x6e/0x150 [ib_core]
>> [  740.231174]  ib_get_eth_speed+0x6e/0x150 [ib_core]
>> [  740.236459]  ? update_curr+0x17e/0x250
>> [  740.240694]  ? find_held_lock+0x2d/0x90
>> [  740.245021]  rxe_query_port+0x98/0xf0 [rdma_rxe]
>> [  740.250141]  ib_query_port+0xc9/0x1a0 [ib_core]
>> [  740.255170]  ? _cond_resched+0x15/0x30
>> [  740.259416]  ? kmem_cache_alloc_trace+0x196/0x200
>> [  740.264628]  ib_cache_update.isra.13.part.14+0x5c/0x340 [ib_core]
>> [  740.271215]  ? process_one_work+0x17a/0x4a0
>> [  740.275893]  ib_cache_task+0x16/0x20 [ib_core]
>> [  740.280820]  process_one_work+0x203/0x4a0
>> [  740.285300]  ? process_one_work+0x17a/0x4a0
>> [  740.289952]  worker_thread+0x46/0x3e0
>> [  740.294082]  kthread+0xf8/0x130
>> [  740.297684]  ? process_one_work+0x4a0/0x4a0
>> [  740.302334]  ? kthread_stop+0x140/0x140
>> [  740.306634]  ret_from_fork+0x35/0x40
>> [  740.310690] INFO: task kworker/7:2:15203 blocked for more than 120
>> seconds.
>> [  740.318125]       Not tainted 4.20.0-rc1-newlink+ #192
>> [  740.323750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  740.332078] kworker/7:2     D    0 15203      2 0x80000080
>> [  740.338068] Workqueue: events linkwatch_event
>> [  740.342923] Call Trace:
>> [  740.345869]  ? __schedule+0x322/0x9c0
>> [  740.350029]  ? __mutex_lock+0x343/0x7d0
>> [  740.354360]  schedule+0x32/0x80
>> [  740.357991]  schedule_preempt_disabled+0xa/0x10
>> [  740.363012]  __mutex_lock+0x348/0x7d0
>> [  740.367160]  ? __mutex_lock+0x130/0x7d0
>> [  740.371478]  ? linkwatch_event+0xa/0x30
>> [  740.375796]  linkwatch_event+0xa/0x30
>> [  740.379946]  process_one_work+0x203/0x4a0
>> [  740.384434]  ? process_one_work+0x17a/0x4a0
>> [  740.389101]  worker_thread+0x46/0x3e0
>> [  740.393250]  kthread+0xf8/0x130
>> [  740.396865]  ? process_one_work+0x4a0/0x4a0
>> [  740.401522]  ? kthread_stop+0x140/0x140
>> [  740.405824]  ret_from_fork+0x35/0x40
>> [  740.409854] INFO: task rmmod:15311 blocked for more than 120 seconds.
>> [  740.416757]       Not tainted 4.20.0-rc1-newlink+ #192
>> [  740.422349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  740.430649] rmmod           D    0 15311  14253 0x00000080
>> [  740.436613] Call Trace:
>> [  740.439538]  ? __schedule+0x322/0x9c0
>> [  740.443682]  ? ep_poll_callback+0x2b5/0x370
>> [  740.448340]  schedule+0x32/0x80
>> [  740.451966]  schedule_timeout+0x221/0x3d0
>> [  740.456450]  ? find_held_lock+0x2d/0x90
>> [  740.460755]  ? wait_for_completion+0x108/0x170
>> [  740.465670]  wait_for_completion+0x110/0x170
>> [  740.470411]  ? wake_up_q+0x70/0x70
>> [  740.474283]  flush_workqueue+0x1a7/0x540
>> [  740.478677]  ? flush_workqueue+0x76/0x540
>> [  740.483754]  ib_cache_cleanup_one+0x41/0x180 [ib_core]
>> [  740.489360]  ? __mutex_unlock_slowpath+0x4b/0x290
>> [  740.494537]  ib_unregister_device+0x126/0x1f0 [ib_core]
>> [  740.500227]  rxe_remove+0xe/0x30 [rdma_rxe]
>> [  740.504887]  rxe_notify+0x7f/0x130 [rdma_rxe]
>> [  740.509715]  notifier_call_chain+0x47/0x70
>> [  740.514283]  rollback_registered_many+0x275/0x500
>> [  740.519462]  rollback_registered+0x56/0x90
>> [  740.524032]  unregister_netdevice_queue+0x7e/0x100
>> [  740.529298]  unregister_netdev+0x18/0x20
>> [  740.533694]  remove_one+0x1b9/0x1f0 [cxgb4]
>> [  740.538347]  pci_device_remove+0x3b/0xc0
>> [  740.542734]  device_release_driver_internal+0x175/0x240
>> [  740.548420]  driver_detach+0x3a/0x70
>> [  740.552446]  bus_remove_driver+0x58/0xd0
>> [  740.556805]  pci_unregister_driver+0x26/0xa0
>> [  740.561515]  cxgb4_cleanup_module+0x28/0x61 [cxgb4]
>> [  740.566835]  __x64_sys_delete_module+0x1e2/0x240
>> [  740.571898]  do_syscall_64+0x5b/0x190
>> [  740.576008]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  740.581507] RIP: 0033:0x7f02bc091397
>> [  740.585529] Code: Bad RIP value.
>> [  740.589199] RSP: 002b:00007fffb8f9a458 EFLAGS: 00000202 ORIG_RAX:
>> 00000000000000b0
>> [  740.597231] RAX: ffffffffffffffda RBX: 000000000081e220 RCX:
>> 00007f02bc091397
>> [  740.604832] RDX: 00007f02bc105b20 RSI: 0000000000000800 RDI:
>> 000000000081e288
>> [  740.612438] RBP: 0000000000000000 R08: 00007f02bc35a060 R09:
>> 00007f02bc105b20
>> [  740.620052] R10: 00007fffb8f99ee0 R11: 0000000000000202 R12:
>> 00007fffb8f9c751
>> [  740.627666] R13: 0000000000000000 R14: 000000000081e220 R15:
>> 000000000081e010
>> [  740.635289]
>> Showing all locks held in the system:
>> [  740.642434] 1 lock held by khungtaskd/97:
>> [  740.646928]  #0: 00000000354ce682 (rcu_read_lock){....}, at:
>> debug_show_all_locks+0xe/0x110
>> [  740.655793] 3 locks held by kworker/9:1/113:
>> [  740.660561]  #0: 000000001c09232d
>> ((wq_completion)"%s"("ipv6_addrconf")){....}, at:
>> process_one_work+0x17a/0x4a0
>> [  740.671261]  #1: 00000000614c2ae5 ((addr_chk_work).work){....}, at:
>> process_one_work+0x17a/0x4a0
>> [  740.680565]  #2: 0000000067405b77 (rtnl_mutex){....}, at:
>> addrconf_verify_work+0xa/0x20
>> [  740.689093] 3 locks held by kworker/13:1/117:
>> [  740.693968]  #0: 0000000093301381
>> ((wq_completion)"infiniband"){....}, at: process_one_work+0x17a/0x4a0
>> [  740.703911]  #1: 00000000fb2af2df
>> ((work_completion)(&work->work)#2){....}, at: process_one_work+0x17a/0x4a0
>> [  740.714307]  #2: 00000000ff63eb3d (&rxe->usdev_lock){....}, at:
>> rxe_query_port+0x83/0xf0 [rdma_rxe]
>> [  740.723924] 4 locks held by kworker/7:1/120:
>> [  740.728767]  #0: 0000000093301381
>> ((wq_completion)"infiniband"){....}, at: process_one_work+0x17a/0x4a0
>> [  740.738758]  #1: 00000000fdfc413f
>> ((work_completion)(&work->work)#2){....}, at: process_one_work+0x17a/0x4a0
>> [  740.749187]  #2: 00000000ff63eb3d (&rxe->usdev_lock){....}, at:
>> rxe_query_port+0x83/0xf0 [rdma_rxe]
>> [  740.758851]  #3: 0000000067405b77 (rtnl_mutex){....}, at:
>> ib_get_eth_speed+0x6e/0x150 [ib_core]
>> [  740.768182] 2 locks held by agetty/7388:
>> [  740.772735]  #0: 00000000a52f1433 (&tty->ldisc_sem){....}, at:
>> tty_ldisc_ref_wait+0x24/0x50
>> [  740.781728]  #1: 00000000b21ff732 (&ldata->atomic_read_lock){....},
>> at: n_tty_read+0xd8/0x8f0
>> [  740.790908] 3 locks held by kworker/7:2/15203:
>> [  740.795999]  #0: 00000000a7e503ae ((wq_completion)"events"){....},
>> at: process_one_work+0x17a/0x4a0
>> [  740.805714]  #1: 00000000d09d5f76 ((linkwatch_work).work){....}, at:
>> process_one_work+0x17a/0x4a0
>> [  740.815266]  #2: 0000000067405b77 (rtnl_mutex){....}, at:
>> linkwatch_event+0xa/0x30
>> [  740.823521] 2 locks held by rmmod/15311:
>> [  740.828133]  #0: 00000000923b1412 (&dev->mutex){....}, at:
>> device_release_driver_internal+0x39/0x240
>> [  740.837966]  #1: 0000000067405b77 (rtnl_mutex){....}, at:
>> unregister_netdev+0xe/0x20
>> [  740.846425] 2 locks held by bash/15323:
>> [  740.850971]  #0: 00000000be5388b9 (&tty->ldisc_sem){....}, at:
>> tty_ldisc_ref_wait+0x24/0x50
>> [  740.860054]  #1: 0000000098aaed68 (&ldata->atomic_read_lock){....},
>> at: n_tty_read+0xd8/0x8f0
>>
>> [  740.871555] =============================================
>>
> 
> ib_get_eth_speed() - tries to get rtnl lock
> 
> addrconf_verify_work() - tries to get rtnl lock
> 
> linkwatch_event() - tries to get rtnl lock
> 
> When you try to remove the netdevice, I assume it calls:
> unregister_netdev() - which gets the rtnl lock
> 
> then it sends NETREG_UNREGISTERING event, which I assume rxe tries to

oops, meant: NETDEV_UNREGISTER.

> remove the ib device... and deadlock.
> 
> RXE is really broken :)
> 
> Mark
> 

Mark