Hi Sagi, target-devel, > There are two possibilities here: > 1. We have a bug in isert which posts a wr after calling ib_drain_qp() > which is very likely, but given that this is a recv completion I sorta > doubt it because it means that the QP is not even in error state (or is > it and its an FLUSH error completion?) > > 2. The device you tested with did not really drain the qp properly and a > stray completion found its way afterwards. What device did you test > with? I used a Q-Logic Everest 4 device. > Let me add a third possibility, that is what we are hitting: I see that isert uses isert_cma_handler() and in the following cases drain won't be invoked: case RDMA_CM_EVENT_REJECTED: /* FALLTHRU */ isert_info("Connection rejected: %s\n", rdma_reject_msg(cma_id, event->status)); case RDMA_CM_EVENT_UNREACHABLE: /* FALLTHRU */ case RDMA_CM_EVENT_CONNECT_ERROR: ret = isert_connect_error(cma_id); break; Specifically, I hit the rejected case. See dmesg below with added prints (rrr...). We Are using [ 2455.241978] rrr created QP ffff880e984d6c00 [ 2455.241982] isert: isert_login_post_recv: Setup sge: addr: eb19e4000 length: 8268 0x00000000 [ 2455.241987] rrr post_recv qp=ffff880e984d6c00, wr_id=ffff880eb19e6064 [ 2455.242108] isert: isert_cma_handler: rejected (8): status 10 id ffff880eb1f9b000 np ffff8810454d2c40 [ 2455.242114] isert: isert_cma_handler: Connection rejected: stale conn [ 2455.242121] isert: isert_release_kref: conn ffff880eb19e2000 final kref kworker/7:2/6058 [ 2455.242127] isert: isert_connect_release: conn ffff880eb19e2000 [ 2455.242156] rrr poll_recv qp=ffff880e984d6c00 RDMA_CQE_RESP_STS_WORK_REQUEST_FLUSHED_ERR, wr_id=ffff880eb19e6064 [ 2455.242157] rrr destroyed QP ffff880e984d6c00 [ 2455.242164] Modules linked in: netconsole target_core_user target_core_pscsi target_core_file target_core_iblock [ 2455.242183] BUG: unable to handle kernel [ 2455.242202] [<ffffffffa0823813>] isert_login_recv_done+0x23/0x160 [ib_isert] A QP gets created, post_recv is invoked, poll_cq as well (flushed) the QP is destroyed and then the workqueue tries to dereference the QP... I'm checking why the connection got stale, but anyway I think ib_drain_qp() should be invoked. What do you think? Thanks, Ram ��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��