Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/11/13 10:34 PM, Nicholas A. Bellinger wrote:
On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote:

On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@xxxxxxxxxxx> wrote:


<SNIP>

Mike, any idea as to how this might be happening on the initiator
side..?

<SNIP>

Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
(CC'ing linux-scsi)

Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
session->cmdsn, without ever checking to see if the CmdSN window may
have already closed..

The only CmdSN window check that I see in the I/O dispatch codepath is
in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
iscsi_conn_queue_work() is invoked and process context in
iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
AFAICT no further CmdSN window open checks to ensure the initiator does
not overflow MaxCmdSN..

This would very much line up with the type of bug that is being
reported.  Before trying to determine what a possible fix might look
like, can you try the following libiscsi patch to see if your able to
hit either of the BUG's below..?

Thanks,

--nab

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index e399561..9106f63 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1483,6 +1483,12 @@ check_mgmt:
                       fail_scsi_task(conn->task, DID_IMM_RETRY);
                       continue;
               }
+
+               if (iscsi_check_cmdsn_window_closed(conn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
+                       BUG();
+               }
+
               rc = iscsi_prep_scsi_cmd_pdu(conn->task);
               if (rc) {
                       if (rc == -ENOMEM || rc == -EACCES) {
@@ -1518,6 +1524,11 @@ check_mgmt:
               if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
                       break;

+               if (iscsi_check_cmdsn_window_closed(conn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
+                       BUG();
+               }
+
               conn->task = task;
               list_del_init(&conn->task->running);
               conn->task->state = ISCSI_TASK_RUNNING;

If you hit a bug there then it is probably the target or if you are
using the new libiscsi back/frwd lock patches it could be related to
them.

FYI, this is with v3.11 and RHEL 6.x open-iscsi code..

We should not need to check above because we never queue more cmds
from the scsi layer than the window will allow at the time. If the
window is 10, the queued_cmdsn check should make sure we have only
accepted 10 commands into the iscsi layer so we should not need to
recheck above.

You should just enable libiscsi debug printk in
iscsi_check_cmdsn_window so we can see the sn related values at the
time of queuing. You should enable that printk whithout your patch.

Actually that will not help because we always have the queued cmdsn
lower than the max in that path. I would do your patch but then also
print the queued cmdsn, cmdsn and also the maxcmdsn in the check
function.



Yes, notice the second test patch is doing the explicit check of
task->cmdsn vs. sess->max_cmdsn, instead of the one that checks
->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed().

Btw, I'm surmising the bug in question does not happen from the
iscsi_queuecommand() submission path, but rather from the
iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where
iscsi_conn_queue_work() gets called when the CmdSN window was
previously closed, eg:

         if (max_cmdsn != session->max_cmdsn &&
             !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
                 session->max_cmdsn = max_cmdsn;
                 /*
                  * if the window closed with IO queued, then kick the
                  * xmit thread
                  */
                 if (!list_empty(&session->leadconn->cmdqueue) ||
                     !list_empty(&session->leadconn->mgmtqueue))
                         iscsi_conn_queue_work(session->leadconn);
         }
}

Once iscsi_conn_queue_work() is invoked here to start process context
execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
is no logic in place within iscsi_data_xmit() to honor the last received
MaxCmdSN.

Or to put it another way: what is preventing iscsi_data_xmit() from
completely draining both conn->cmdqueue + conn->requeue, even when the
CmdSN window has potentially been closed again..?


If the window was not big enough we would have not accepted the cmd from scsi-ml in iscsi_queuecommand. We would have never put it on the list above then.

In the initiator we have:

session->queued_cmdsn - It will always be less than or equal to the max_cmdsn, but greater than or equal to session->cmdsn. Think of it like preallocating the sn we are going to give the cmd. Depending on ordering it might be different (3 instead of 4), but the final value it gets will be less than the max_cmdsn.

session->cmdsn - the sn we give the cmd when we actually put it on the wire. It is always less than or equal to queued_cmdsn.

So we have cmdsn <= queued_cmdsn <= max_cmdsn.

The iscsi_queuecommand window check makes sure we will only have put a cmd on the cmdqueue list if queued_cmdsn was less than or equal to the max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we can run iscsi_data_xmit without checking the window values again, because it is not possible for something like max_cmdsn to decrease on us.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux