Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-11-11 at 16:42 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2013-11-11 at 13:17 -0800, Nicholas A. Bellinger wrote:
> > Hi Moussa & Co,
> > 
> > On Mon, 2013-11-11 at 19:05 +0000, Moussa Ba (moussaba) wrote:
> > > My system setup is as follows:
> > > 
> > > Target: CentOS 6.4 - LIO Target running 3.12 kernel, 8 PCIe SSD in one volume group, 8 logical volumes, 8 targets, 1 LUN/target.
> > > Initiator: CentOS 6.4 running 3.11 Kernel (Also ran 2.6.32-358), ISER based initiator over Mellanox 40Gbit ConnectX HCA
> > > 
> > > When running performance tests on the initiator, I am running into
> > fio timeouts that lead to ABORT_TASK commands on the target.  In other
> > words, fio fails to "reap" all the io threads almost as if it is
> > waiting for lost IOs to complete. This is happening on random write IO
> > operations.  Some context, we are generating about 576KIOPS 4KB block
> > sizes using 8 LUNS.  The PCIe SSD have a write buffer that can absorb
> > writes with an end to end latency on the initiator of 44us.  We are
> > not currently seeing any errors on read IOs, which tend to have 2X+
> > the latency of writes.  
> > > 
> > > See below for the dmesg on the target side.
> > > Timeout Condition occurs at 154 which is the Protocol Error after fio is interrupted or runs to completion.  
> > > [  154.453663] Received CmdSN: 0x000fcbb7 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > [  154.453673] Received CmdSN: 0x000fcbb8 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > 
> > 
> > (CC'ing Mike)
> > 
> > As mentioned off-list, this would tend to indicate some manner of
> > open-iscsi bug, as it's never legal for an initiator to send a CmdSN
> > greater than the MaxCmdSN that's currently being sent in target outgoing
> > response PDUs.
> > 
> > Mike, any idea as to how this might be happening on the initiator
> > side..?
> > 
> 
> So looking at open-iscsi, nothing immediately jumps out..
> 
> Doing a bit more review of the target side, I did notice the following:
> 
> iscsit_increment_maxcmdsn() is getting called twice for RDMA WRITE..
> Once during setup in isert_reg_rdma_frwr() / isert_map_rdma(), and
> another time in isert_put_datain() -> iscsit_build_rsp_pdu().
> 
> However, iscsit_increment_maxcmsn() is smart enough to only increment
> MaxCmdSN once for each iscsi_cmd regardless of the number of calls, and
> AFAICT still does not explain the initiator sending CmdSNs larger than
> what's being set by MaxCmdSN in individual response PDUs.
> 
> However, it might be useful to see if the following patch has any effect
> by delaying the MaxCmdSN increment until slightly later in the
> isert_put_datain() callchain.
> 

One other minor change to delay the per connection StatSN increment
until the same location in iscsit_build_rsp_pdu() as MaxCmdSN.

--nab

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 27708c3..bfd566f 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
        if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
                data_left = se_cmd->data_length;
-               iscsit_increment_maxcmdsn(cmd, conn->sess);
-               cmd->stat_sn = conn->stat_sn++;
        } else {
                sg_off = cmd->write_data_done / PAGE_SIZE;
                data_left = se_cmd->data_length - cmd->write_data_done;
@@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
        if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
                data_left = se_cmd->data_length;
-               iscsit_increment_maxcmdsn(cmd, conn->sess);
-               cmd->stat_sn = conn->stat_sn++;
        } else {
                sg_off = cmd->write_data_done / PAGE_SIZE;
                data_left = se_cmd->data_length - cmd->write_data_done;
@@ -2344,7 +2340,7 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
         * Build isert_conn->tx_desc for iSCSI response PDU and attach
         */
        isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc);
-       iscsit_build_rsp_pdu(cmd, conn, false, (struct iscsi_scsi_rsp *)
+       iscsit_build_rsp_pdu(cmd, conn, true, (struct iscsi_scsi_rsp *)
                             &isert_cmd->tx_desc.iscsi_header);
        isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
        isert_init_send_wr(isert_conn, isert_cmd,




--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux