On 09/07/2009 08:00 AM, Florian Haas wrote: > On 09/07/2009 07:58 AM, FUJITA Tomonori wrote: >> On Mon, 07 Sep 2009 07:47:32 +0200 >> Florian Haas <florian.haas@xxxxxxxxxx> wrote: >> >>>> What iSCSI initiator implementation do you use on Linux? >>> For testing? I use open-iscsi on Debian lenny; the Debian package >>> version number is 2.0.870~rc3-0.4. My sg3-utils package version (if >>> that's of any help) is 1.24-2. >> That's strange. open-iscsi doesn't send TARGET_REST. > > Interesting. Let me grab a packet dump and I'll be back in touch. You're right, open-iSCSI apparently simply translates any host or "bus" reset into a new login sequence. For device resets, it does use a Task Management Function (0x02) of the type "LU reset" (0x05). Going back to my original problem, I've now sifted through packet traces generated on the production iSCSI target server (the one that the MSCS hosts talk to), and have encountered something that leaves me confounded. This applies to both IET and STGT (hence yet another cross-post to both lists), so it's either something that is wrong in both implementation, or some breakage in MSCS. Perhaps someone can enlighten me here. Here is the situation: - I have two initiator hosts, 10.160.156.24 and 10.160.156.26. Both are part of the same MSCS cluster. - The quorum device is on the iSCSI target, LUN 3. - .24 is the active host. It issues RESERVE commands every three seconds and gets these confirmed reliably. - .24 gets forcibly disconnected from the network, by having its Ethernet cable removed. - 110 seconds expire. This is in line with a relatively long Time2Retain, which in this setup is 90 seconds. - 110 seconds after .24 has issued its last successful RESERVE command, .26 apparently attempts to acquire the quorum device. - I now see a SERVICE ACTION IN command (opcode 0x9e) from .26, with a Service Action of Read Capacity (10). - This fails with a Reservation Conflict (0x18) status. The initiator on .26 then repeats the last two actions indefinitely. It apparently never even attempts to recover from this situation. Whatever "bus reset" entries I am seeing in the Windows Event log, none of those actions ever appear to actually reach the target -- I am not seeing a renewed login attempt, nor a target reset, nor a LUN reset, nothing. I am also failing to understand why the MS initiator would use the SERVICE ACTION IN detour when upon initial login it just uses standard INQUIRY commands and READ CAPACITY. I have complete pcap traces from the sequence of events, both for IET and STGT -- I can send them off-list if anyone is interested in looking into this. Fujita-san, Ross, Arne -- any ideas at all? Cheers, Florian
Attachment:
signature.asc
Description: OpenPGP digital signature