On Tue, 2018-01-02 at 19:44 +0200, Moni Shoua wrote: > This is a great input for the debugger (whoever that be). From a brief > look at the code I see that error QP is checked when during the > validation of RDMA_WRITE request. In this case a completion is > generated and the size of the buffer to write remains irrelevant. > However, to verify that I wasn't wrong you can add some printk() in > the path that starts with rxe_responder(). When flow reaches > check_resource() and when QP is in ERROR state the function returns > RESPST_COMPLETE. The next step in the state machine would be to call > the do_complete() function. Hello Moni, Thanks for the feedback and the suggestion. I will check the ib_srpt code further for possible race conditions. But after I had enabled the dynamic debugging statements in the rdma_rxe driver I ran into something of which I don't think that it is caused by the ib_srpt driver (with memory poisoning enabled): rdma_rxe:rxe_responder: rdma_rxe: qp#19 state = CLEANUP rdma_rxe:rxe_responder: rdma_rxe: qp#19 state = DONE general protection fault: 0000 [#1] PREEMPT SMP CPU: 1 PID: 1385 Comm: kworker/1:26 Not tainted 4.15.0-rc4-dbg+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: target_completion target_complete_ok_work [target_core_mod] RIP: 0010:__lock_acquire+0xe4/0x13b0 RSP: 0018:ffff944ec40df9b0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8acf4b076748 RBP: ffff944ec40dfa80 R08: 0000000000000001 R09: ffffffffc061d1b7 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8acf5e8f2880 R14: 0000000000000001 R15: ffff8acf4b076748 FS: 0000000000000000(0000) GS:ffff8acf7fc8000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffab80d000 CR3: 000000005fa0f005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: lock_acquire+0xac/0x230 _raw_spin_lock_irqsave+0x45/0x60 rxe_do_task+0x87/0x100 [rdma_rxe] rxe_run_task+0x16/0x30 [rdma_rxe] rxe_resp_queue_pkt+0x42/0x50 [rdma_rxe] rxe_rcv+0x363/0x8b0 [rdma_rxe] rxe_loopback+0x9/0x10 [rdma_rxe] rxe_requester+0x6ea/0x1160 [rdma_rxe] rxe_do_task+0x7c/0x100 [rdma_rxe] rxe_run_task+0x16/0x30 [rdma_rxe] rxe_post_send+0x2f0/0x550 [rdma_rxe] srpt_queue_response+0x20c/0x400 [ib_srpt] srpt_queue_status+0x28/0x40 [ib_srpt] target_complete_ok_work+0x1ea/0x520 [target_core_mod] process_one_work+0x211/0x6a0 worker_thread+0x38/0x3b0 kthread+0x124/0x140 (gdb) list *(rxe_do_task+0x87) 0xc1e7 is in rxe_do_task (drivers/infiniband/sw/rxe/rxe_task.c:90). 85 do { 86 cont = 0; 87 ret = task->func(task->arg); 88 89 spin_lock_irqsave(&task->state_lock, flags); 90 switch (task->state) { 91 case TASK_STATE_BUSY: 92 if (ret) 93 task->state = TASK_STATE_START; 94 else >From the disas rxe_do_task output: 0x000000000000c1d9 <+121>: callq *0x78(%rbx) 0x000000000000c1dc <+124>: mov %r12,%rdi 0x000000000000c1df <+127>: mov %eax,%r14d 0x000000000000c1e2 <+130>: callq 0xc1e7 <rxe_do_task+135> Does this perhaps mean that the rxe_qp structure can be freed while rxe_do_task() is in progress? Please note that the ib_srpt driver only destroys a QP (srpt_destroy_ch_ib() call in srpt_release_channel_work()) after all SCSI command processing has finished (transport_deregister_session()). Thanks, Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f