RE: possible isert bug in tear down sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sagi, target-devel,

> There are two possibilities here:
> 1. We have a bug in isert which posts a wr after calling ib_drain_qp()
> which is very likely, but given that this is a recv completion I sorta
> doubt it because it means that the QP is not even in error state (or is
> it and its an FLUSH error completion?)
> 
> 2. The device you tested with did not really drain the qp properly and a
> stray completion found its way afterwards. What device did you test
> with?

I used a Q-Logic Everest 4 device.

>

Let me add a third possibility, that is what we are hitting:
I see that isert uses isert_cma_handler() and in the following cases
drain won't be invoked:
        case RDMA_CM_EVENT_REJECTED:       /* FALLTHRU */
                isert_info("Connection rejected: %s\n",
                           rdma_reject_msg(cma_id, event->status));
        case RDMA_CM_EVENT_UNREACHABLE:    /* FALLTHRU */
        case RDMA_CM_EVENT_CONNECT_ERROR:
                ret = isert_connect_error(cma_id);
                break;

Specifically, I hit the rejected case. See dmesg below with added prints (rrr...).
We Are using 

[ 2455.241978] rrr created QP ffff880e984d6c00
[ 2455.241982] isert: isert_login_post_recv: Setup sge: addr: eb19e4000 length: 8268 0x00000000
[ 2455.241987] rrr post_recv qp=ffff880e984d6c00, wr_id=ffff880eb19e6064
[ 2455.242108] isert: isert_cma_handler: rejected (8): status 10 id ffff880eb1f9b000 np ffff8810454d2c40
[ 2455.242114] isert: isert_cma_handler: Connection rejected: stale conn
[ 2455.242121] isert: isert_release_kref: conn ffff880eb19e2000 final kref kworker/7:2/6058
[ 2455.242127] isert: isert_connect_release: conn ffff880eb19e2000
[ 2455.242156] rrr poll_recv qp=ffff880e984d6c00 RDMA_CQE_RESP_STS_WORK_REQUEST_FLUSHED_ERR, wr_id=ffff880eb19e6064
[ 2455.242157] rrr destroyed QP ffff880e984d6c00
[ 2455.242164] Modules linked in: netconsole target_core_user target_core_pscsi target_core_file target_core_iblock
[ 2455.242183] BUG: unable to handle kernel
[ 2455.242202]  [<ffffffffa0823813>] isert_login_recv_done+0x23/0x160 [ib_isert]

A QP gets created, post_recv is invoked, poll_cq as well (flushed) the QP is destroyed and then the workqueue tries to dereference the QP...

I'm checking why the connection got stale, but anyway I think ib_drain_qp() should be invoked.

What do you think?

Thanks,
Ram

��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux