On Sat, Oct 29, 2022 12:10 PM Bob Pearson wrote: <...> > Bob Pearson (13): > RDMA/rxe: Make task interface pluggable > RDMA/rxe: Split rxe_drain_resp_pkts() > RDMA/rxe: Simplify reset state handling in rxe_resp.c > RDMA/rxe: Handle qp error in rxe_resp.c > RDMA/rxe: Cleanup comp tasks in rxe_qp.c > RDMA/rxe: Remove __rxe_do_task() > RDMA/rxe: Make tasks schedule each other > RDMA/rxe: Implement disable/enable_task() > RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE > RDMA/rxe: Replace task->destroyed by task state INVALID. > RDMA/rxe: Add workqueue support for tasks > RDMA/rxe: Make WORKQUEUE default for RC tasks > RDMA/rxe: Remove tasklets from rxe_task.c Hello Bob, I have found a soft lockup issue reproducible with rdma-core testcases. It does not happen on the latest for-next tree but with this patch series, so we need to fix the issue before getting it merged. I did the test on a VM with 8 CPUs. I fetched the latest rdma-core and built it, and executed the following command inside 'rdma-core' directory: # while true; do ./build/bin/run_tests.py -v --dev rxe_ens6 --gid 1; sleep 2; done (Please specify your 'dev' and 'gid' appropriately.) Before 10 minutes passed, my console freezed and became unresponsive, showing the test progress below: ===== test_create_ah (tests.test_addr.AHTest) Test ibv_create_ah. ... ok test_create_ah_roce (tests.test_addr.AHTest) Verify that AH can't be created without GRH in RoCE ... ok test_destroy_ah (tests.test_addr.AHTest) Test ibv_destroy_ah. ... ok test_atomic_cmp_and_swap (tests.test_atomic.AtomicTest) ... ok test_atomic_fetch_and_add (tests.test_atomic.AtomicTest) ... ok test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... ===== Note this does not always happen. You may have to wait for some minutes. Here is the backtrace: ===== [ 1212.135650] watchdog: BUG: soft lockup - CPU#3 stuck for 1017s! [python3:3428] [ 1212.138144] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel ib_core rfkill sunrpc intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass joydev nd_pmem virtio_balloon dax_pmem nd_btt i2c_piix4 pcspkr drm xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg nd_e820 ata_generic libnvdimm crct10dif_pclmul crc32_pclmul crc32c_intel ata_piix virtio_net libata ghash_clmulni_intel e1000 net_failover sha512_ssse3 failover virtio_console serio_raw dm_mirror dm_region_hash dm_log dm_mod fuse [ 1212.147152] CPU: 3 PID: 3428 Comm: python3 Kdump: loaded Tainted: G L 6.1.0-rc1+ #29 [ 1212.148425] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1212.149754] RIP: 0010:__local_bh_enable_ip+0x26/0x70 [ 1212.150464] Code: 00 00 66 90 0f 1f 44 00 00 65 8b 05 c4 7e d2 4e a9 00 00 0f 00 75 31 83 ee 01 f7 de 65 01 35 b1 7e d2 4e 65 8b 05 aa 7e d2 4e <a9> 00 ff ff 00 74 1b 65 ff 0d 9c 7e d2 4e 65 8b 05 95 7e d2 4e 85 [ 1212.153081] RSP: 0018:ffffaf0b8054baf8 EFLAGS: 00000203 [ 1212.153822] RAX: 0000000000000001 RBX: ffff8d2189171450 RCX: 00000000ffffffff [ 1212.154823] RDX: 0000000000000001 RSI: 00000000fffffe00 RDI: ffffffffc0b51baa [ 1212.155826] RBP: ffff8d2189171474 R08: 0000011a38ad4004 R09: 0000000000000101 [ 1212.156862] R10: ffffffffb2c06100 R11: 0000000000000000 R12: 0000000000000000 [ 1212.157860] R13: ffff8d218df5eda0 R14: ffff8d2181058328 R15: 0000000000000000 [ 1212.158858] FS: 00007f8cec55f740(0000) GS:ffff8d23b5cc0000(0000) knlGS:0000000000000000 [ 1212.159989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1212.160837] CR2: 00007f8ceaa15024 CR3: 0000000108c2c002 CR4: 0000000000060ee0 [ 1212.161841] Call Trace: [ 1212.162205] <TASK> [ 1212.162522] work_cleanup+0x3a/0x40 [rdma_rxe] [ 1212.163172] rxe_qp_do_cleanup+0x54/0x1e0 [rdma_rxe] [ 1212.163888] execute_in_process_context+0x23/0x70 [ 1212.164575] __rxe_cleanup+0xc6/0x170 [rdma_rxe] [ 1212.165245] rxe_destroy_qp+0x28/0x40 [rdma_rxe] [ 1212.165909] ib_destroy_qp_user+0x90/0x1b0 [ib_core] [ 1212.166646] uverbs_free_qp+0x35/0x90 [ib_uverbs] [ 1212.167333] destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs] [ 1212.168103] uverbs_destroy_uobject+0x37/0x1c0 [ib_uverbs] [ 1212.168899] uobj_destroy+0x3c/0x80 [ib_uverbs] [ 1212.169532] ib_uverbs_run_method+0x203/0x320 [ib_uverbs] [ 1212.170412] ? uverbs_free_qp+0x90/0x90 [ib_uverbs] [ 1212.171151] ib_uverbs_cmd_verbs+0x172/0x220 [ib_uverbs] [ 1212.171912] ? free_unref_page_commit+0x7e/0x170 [ 1212.172583] ? xa_destroy+0x82/0x110 [ 1212.173104] ? kvfree_call_rcu+0x27d/0x310 [ 1212.173692] ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 [ 1212.174481] ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs] [ 1212.175182] __x64_sys_ioctl+0x8a/0xc0 [ 1212.175725] do_syscall_64+0x3b/0x90 [ 1212.176243] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1212.176973] RIP: 0033:0x7f8cebe3ec6b [ 1212.177489] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [ 1212.180072] RSP: 002b:00007ffee64927a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1212.181150] RAX: ffffffffffffffda RBX: 00007ffee64928d8 RCX: 00007f8cebe3ec6b [ 1212.182131] RDX: 00007ffee64928c0 RSI: 00000000c0181b01 RDI: 0000000000000003 [ 1212.183099] RBP: 00007ffee64928a0 R08: 000056424f8c0c70 R09: 000000000002ae30 [ 1212.184064] R10: 00007f8cead75e10 R11: 0000000000000246 R12: 00007ffee649287c [ 1212.185061] R13: 0000000000000022 R14: 000056424f8c0560 R15: 00007f8cebb903d0 [ 1212.186031] </TASK> Message from syslogd@c9ibremote at Nov 17 17:03:12 ... kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 1017s! [python3:3428] ===== It seems that ibv_destroy_qp(3) was issued from userspace, right? I am not very familiar with uverbs ;( The process got stuck at the do-while loop below: ===== rxe_task.c --- /* busy wait until any previous tasks are done */ static void cleanup_task(struct rxe_task *task) { bool busy; do { spin_lock_bh(&task->lock); busy = (task->state == TASK_STATE_BUSY || task->state == TASK_STATE_ARMED); if (!busy) task->state = TASK_STATE_INVALID; spin_unlock_bh(&task->lock); } while (busy); } ===== Typically task->state is 0 (i.e. "TASK_STATE_IDLE") when we reach here, but in the infinite loop, task->state was constantly 1 (i.e. "TASK_STATE_BUSY"). IMO, the bottom halves completed their works leaving task->state "TASK_STATE_BUSY", and ibv_destroy_qp(3) issued from userspace thereafter got stuck, but I am not sure how this counter-intuitive state transition occurs. Do you have any idea why this happens and how to fix this? I cannot take enough time to inspect this issue further right now, but I would update you if I find anything helpful to fix this. Thanks, Daisuke > > drivers/infiniband/sw/rxe/rxe.c | 9 +- > drivers/infiniband/sw/rxe/rxe_comp.c | 24 ++- > drivers/infiniband/sw/rxe/rxe_qp.c | 80 ++++----- > drivers/infiniband/sw/rxe/rxe_req.c | 4 +- > drivers/infiniband/sw/rxe/rxe_resp.c | 70 +++++--- > drivers/infiniband/sw/rxe/rxe_task.c | 258 +++++++++++++++++++-------- > drivers/infiniband/sw/rxe/rxe_task.h | 56 +++--- > 7 files changed, 329 insertions(+), 172 deletions(-) > > > base-commit: 692373d186205dfb1b56f35f22702412d94d9420 > -- > 2.34.1