Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote:
> 
> > On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@xxxxxxxxxxx> wrote:
> > 

<SNIP>

> >>>>> Mike, any idea as to how this might be happening on the initiator
> >>>>> side..?
> >> 
> >> <SNIP>
> >> 
> >> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
> >> (CC'ing linux-scsi)
> >> 
> >> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
> >> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
> >> session->cmdsn, without ever checking to see if the CmdSN window may
> >> have already closed..
> >> 
> >> The only CmdSN window check that I see in the I/O dispatch codepath is
> >> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
> >> iscsi_conn_queue_work() is invoked and process context in
> >> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
> >> AFAICT no further CmdSN window open checks to ensure the initiator does
> >> not overflow MaxCmdSN..
> >> 
> >> This would very much line up with the type of bug that is being
> >> reported.  Before trying to determine what a possible fix might look
> >> like, can you try the following libiscsi patch to see if your able to
> >> hit either of the BUG's below..?
> >> 
> >> Thanks,
> >> 
> >> --nab
> >> 
> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> >> index e399561..9106f63 100644
> >> --- a/drivers/scsi/libiscsi.c
> >> +++ b/drivers/scsi/libiscsi.c
> >> @@ -1483,6 +1483,12 @@ check_mgmt:
> >>                       fail_scsi_task(conn->task, DID_IMM_RETRY);
> >>                       continue;
> >>               }
> >> +
> >> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >> +                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
> >> +                       BUG();
> >> +               }
> >> +
> >>               rc = iscsi_prep_scsi_cmd_pdu(conn->task);
> >>               if (rc) {
> >>                       if (rc == -ENOMEM || rc == -EACCES) {
> >> @@ -1518,6 +1524,11 @@ check_mgmt:
> >>               if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
> >>                       break;
> >> 
> >> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >> +                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
> >> +                       BUG();
> >> +               }
> >> +
> >>               conn->task = task;
> >>               list_del_init(&conn->task->running);
> >>               conn->task->state = ISCSI_TASK_RUNNING;
> > 
> > If you hit a bug there then it is probably the target or if you are
> using the new libiscsi back/frwd lock patches it could be related to
> them. 

FYI, this is with v3.11 and RHEL 6.x open-iscsi code..

> We should not need to check above because we never queue more cmds
> from the scsi layer than the window will allow at the time. If the
> window is 10, the queued_cmdsn check should make sure we have only
> accepted 10 commands into the iscsi layer so we should not need to
> recheck above.
> > 
> > You should just enable libiscsi debug printk in
> iscsi_check_cmdsn_window so we can see the sn related values at the
> time of queuing. You should enable that printk whithout your patch.
> 
> Actually that will not help because we always have the queued cmdsn
> lower than the max in that path. I would do your patch but then also
> print the queued cmdsn, cmdsn and also the maxcmdsn in the check
> function.
> 
> 

Yes, notice the second test patch is doing the explicit check of
task->cmdsn vs. sess->max_cmdsn, instead of the one that checks
->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed().

Btw, I'm surmising the bug in question does not happen from the
iscsi_queuecommand() submission path, but rather from the
iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where
iscsi_conn_queue_work() gets called when the CmdSN window was
previously closed, eg:

        if (max_cmdsn != session->max_cmdsn &&
            !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
                session->max_cmdsn = max_cmdsn;
                /*
                 * if the window closed with IO queued, then kick the
                 * xmit thread
                 */
                if (!list_empty(&session->leadconn->cmdqueue) ||
                    !list_empty(&session->leadconn->mgmtqueue))
                        iscsi_conn_queue_work(session->leadconn);
        }
}

Once iscsi_conn_queue_work() is invoked here to start process context
execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
is no logic in place within iscsi_data_xmit() to honor the last received
MaxCmdSN.

Or to put it another way: what is preventing iscsi_data_xmit() from
completely draining both conn->cmdqueue + conn->requeue, even when the
CmdSN window has potentially been closed again..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux