On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote: > > > On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@xxxxxxxxxxx> wrote: > > <SNIP> > >>>>> Mike, any idea as to how this might be happening on the initiator > >>>>> side..? > >> > >> <SNIP> > >> > >> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi. > >> (CC'ing linux-scsi) > >> > >> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain > >> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment > >> session->cmdsn, without ever checking to see if the CmdSN window may > >> have already closed.. > >> > >> The only CmdSN window check that I see in the I/O dispatch codepath is > >> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once > >> iscsi_conn_queue_work() is invoked and process context in > >> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is > >> AFAICT no further CmdSN window open checks to ensure the initiator does > >> not overflow MaxCmdSN.. > >> > >> This would very much line up with the type of bug that is being > >> reported. Before trying to determine what a possible fix might look > >> like, can you try the following libiscsi patch to see if your able to > >> hit either of the BUG's below..? > >> > >> Thanks, > >> > >> --nab > >> > >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > >> index e399561..9106f63 100644 > >> --- a/drivers/scsi/libiscsi.c > >> +++ b/drivers/scsi/libiscsi.c > >> @@ -1483,6 +1483,12 @@ check_mgmt: > >> fail_scsi_task(conn->task, DID_IMM_RETRY); > >> continue; > >> } > >> + > >> + if (iscsi_check_cmdsn_window_closed(conn)) { > >> + printk("CmdSN window already closed in iscsi_data_xmit #1\n"); > >> + BUG(); > >> + } > >> + > >> rc = iscsi_prep_scsi_cmd_pdu(conn->task); > >> if (rc) { > >> if (rc == -ENOMEM || rc == -EACCES) { > >> @@ -1518,6 +1524,11 @@ check_mgmt: > >> if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT)) > >> break; > >> > >> + if (iscsi_check_cmdsn_window_closed(conn)) { > >> + printk("CmdSN window already closed in iscsi_data_xmit #2\n"); > >> + BUG(); > >> + } > >> + > >> conn->task = task; > >> list_del_init(&conn->task->running); > >> conn->task->state = ISCSI_TASK_RUNNING; > > > > If you hit a bug there then it is probably the target or if you are > using the new libiscsi back/frwd lock patches it could be related to > them. FYI, this is with v3.11 and RHEL 6.x open-iscsi code.. > We should not need to check above because we never queue more cmds > from the scsi layer than the window will allow at the time. If the > window is 10, the queued_cmdsn check should make sure we have only > accepted 10 commands into the iscsi layer so we should not need to > recheck above. > > > > You should just enable libiscsi debug printk in > iscsi_check_cmdsn_window so we can see the sn related values at the > time of queuing. You should enable that printk whithout your patch. > > Actually that will not help because we always have the queued cmdsn > lower than the max in that path. I would do your patch but then also > print the queued cmdsn, cmdsn and also the maxcmdsn in the check > function. > > Yes, notice the second test patch is doing the explicit check of task->cmdsn vs. sess->max_cmdsn, instead of the one that checks ->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed(). Btw, I'm surmising the bug in question does not happen from the iscsi_queuecommand() submission path, but rather from the iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where iscsi_conn_queue_work() gets called when the CmdSN window was previously closed, eg: if (max_cmdsn != session->max_cmdsn && !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) { session->max_cmdsn = max_cmdsn; /* * if the window closed with IO queued, then kick the * xmit thread */ if (!list_empty(&session->leadconn->cmdqueue) || !list_empty(&session->leadconn->mgmtqueue)) iscsi_conn_queue_work(session->leadconn); } } Once iscsi_conn_queue_work() is invoked here to start process context execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there is no logic in place within iscsi_data_xmit() to honor the last received MaxCmdSN. Or to put it another way: what is preventing iscsi_data_xmit() from completely draining both conn->cmdqueue + conn->requeue, even when the CmdSN window has potentially been closed again..? --nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html