Re: Failing on NIC removal

Mike Christie <michaelc@xxxxxxxxxxx> · Mon, 19 Nov 2007 15:58:21 -0600

Mike Anderson wrote:
cc'ing open-iscsi

Scott Moseman <scmoseman@xxxxxxxxx> wrote:
So I finally got my multipath running through both the NIC and HBA
interfaces, but I'm not having any luck going through testing to
verify it's actually failing over between the connections.

# multipath -l
mpath0 (30690a018f015191a6472441d1500f057)
[size=4 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 3:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:0 sdb 8:16 [active][ready]

I can unplug the HBA (see below) and the connection to the SAN remains.

# multipath -l
mpath0 (30690a018f015191a6472441d1500f057)
[size=4 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 3:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:0 sdb 8:16 [failed][faulty]

But when I unplug the NIC connection, the multipath command hands,
trying to list files on the SAN partition hangs, and I'm getting these
messages:

Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Connect failed with rc
-113: No route to host
Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: establish_session
failed. Could not connect to target
Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Waiting 10 seconds
before next login attempt

How to troubleshoot this situation?

The IO is hanging waiting for the connection to be reestablished. 

You may need to set ConnFailTimeout to a non-zero value as indicated in
http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme

Mike Anderson is right. If you are using multipath you should set 
ConnFailTimeout to a low value like 3 or 5 seconds, because we want to 
fail commands quickly to the multipath layer. For dm-multipath you want 
to then set no_path_retry to either queue IO forever (or until the paths 
come back), or to some timeout.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel