Task hung while using RTRS with rxe and using "ip" utility to down the interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We are experiencing a hang while using RTRS with softROCE and
transitioning the network interface down using ifdown command.

Steps.
1) Map an RNBD/RTRS device through softROCE port.
2) Once mapped, transition the eth interface to down (on which the
softROCE interface was created) using command "ifconfig <ethx> down",
or "ip link set <ethx> down".
3) The device errors out, and one can see RTRS connection errors in
dmesg. So far so good.
4) After a while, we see task hung traces in dmesg.

[  550.866462] INFO: task kworker/1:2:170 blocked for more than 184 seconds.
[  550.868820]       Tainted: G           O      5.10.42-pserver+ #84
[  550.869337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  550.869963] task:kworker/1:2     state:D stack:    0 pid:  170
ppid:     2 flags:0x00004000
[  550.870619] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
[  550.871134] Call Trace:
[  550.871375]  __schedule+0x421/0x810
[  550.871683]  schedule+0x46/0xb0
[  550.871964]  schedule_timeout+0x20e/0x2a0
[  550.872300]  ? internal_add_timer+0x44/0x70
[  550.872650]  wait_for_completion+0x86/0xe0
[  550.872994]  cm_destroy_id+0x18c/0x5a0 [ib_cm]
[  550.873357]  ? _cond_resched+0x15/0x30
[  550.873680]  ? wait_for_completion+0x33/0xe0
[  550.874036]  _destroy_id+0x57/0x210 [rdma_cm]
[  550.874395]  rtrs_srv_close_work+0xcc/0x250 [rtrs_server]
[  550.874819]  process_one_work+0x1d4/0x370
[  550.875156]  worker_thread+0x4a/0x3b0
[  550.875471]  ? process_one_work+0x370/0x370
[  550.875817]  kthread+0xfe/0x140
[  550.876098]  ? kthread_park+0x90/0x90
[  550.876453]  ret_from_fork+0x1f/0x30


Our observations till now.

1) Does not occur if we use "ifdown <ethx>" instead. There is a
difference between the commands, but we are not sure if the above one
should lead to a task hang.
https://access.redhat.com/solutions/27166
2) We have verified v5.10 and v.15.1 kernels, and both have this issue.
3) We tried the same test with NvmeOf target and host over softROCE.
We get the same task hang after doing "ifconfig .. down".

[Tue Nov  9 14:28:51 2021] INFO: task kworker/1:1:34 blocked for more
than 184 seconds.
[Tue Nov  9 14:28:51 2021]       Tainted: G           O
5.10.42-pserver+ #84
[Tue Nov  9 14:28:51 2021] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Nov  9 14:28:51 2021] task:kworker/1:1     state:D stack:    0
pid:   34 ppid:     2 flags:0x00004000
[Tue Nov  9 14:28:51 2021] Workqueue: events
nvmet_rdma_release_queue_work [nvmet_rdma]
[Tue Nov  9 14:28:51 2021] Call Trace:
[Tue Nov  9 14:28:51 2021]  __schedule+0x421/0x810
[Tue Nov  9 14:28:51 2021]  schedule+0x46/0xb0
[Tue Nov  9 14:28:51 2021]  schedule_timeout+0x20e/0x2a0
[Tue Nov  9 14:28:51 2021]  ? internal_add_timer+0x44/0x70
[Tue Nov  9 14:28:51 2021]  wait_for_completion+0x86/0xe0
[Tue Nov  9 14:28:51 2021]  cm_destroy_id+0x18c/0x5a0 [ib_cm]
[Tue Nov  9 14:28:51 2021]  ? _cond_resched+0x15/0x30
[Tue Nov  9 14:28:51 2021]  ? wait_for_completion+0x33/0xe0
[Tue Nov  9 14:28:51 2021]  _destroy_id+0x57/0x210 [rdma_cm]
[Tue Nov  9 14:28:51 2021]  nvmet_rdma_free_queue+0x2e/0xc0 [nvmet_rdma]
[Tue Nov  9 14:28:51 2021]  nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
[Tue Nov  9 14:28:51 2021]  process_one_work+0x1d4/0x370
[Tue Nov  9 14:28:51 2021]  worker_thread+0x4a/0x3b0
[Tue Nov  9 14:28:51 2021]  ? process_one_work+0x370/0x370
[Tue Nov  9 14:28:51 2021]  kthread+0xfe/0x140
[Tue Nov  9 14:28:51 2021]  ? kthread_park+0x90/0x90
[Tue Nov  9 14:28:51 2021]  ret_from_fork+0x1f/0x30

Is this an known issue with ifconfig or the rxe driver? Thoughts?

Regards
-Haris



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux