Am Montag, den 07.09.2009, 10:16 +0200 schrieb Florian Haas: > On 09/07/2009 08:00 AM, Florian Haas wrote: > > On 09/07/2009 07:58 AM, FUJITA Tomonori wrote: > >> On Mon, 07 Sep 2009 07:47:32 +0200 > >> Florian Haas <florian.haas@xxxxxxxxxx> wrote: > >> > >>>> What iSCSI initiator implementation do you use on Linux? > >>> For testing? I use open-iscsi on Debian lenny; the Debian package > >>> version number is 2.0.870~rc3-0.4. My sg3-utils package version (if > >>> that's of any help) is 1.24-2. > >> That's strange. open-iscsi doesn't send TARGET_REST. > > > > Interesting. Let me grab a packet dump and I'll be back in touch. > > You're right, open-iSCSI apparently simply translates any host or "bus" > reset into a new login sequence. For device resets, it does use a Task > Management Function (0x02) of the type "LU reset" (0x05). > > Going back to my original problem, I've now sifted through packet traces > generated on the production iSCSI target server (the one that the MSCS > hosts talk to), and have encountered something that leaves me > confounded. This applies to both IET and STGT (hence yet another > cross-post to both lists), so it's either something that is wrong in > both implementation, or some breakage in MSCS. Perhaps someone can > enlighten me here. > > Here is the situation: > - I have two initiator hosts, 10.160.156.24 and 10.160.156.26. Both are > part of the same MSCS cluster. > - The quorum device is on the iSCSI target, LUN 3. > - .24 is the active host. It issues RESERVE commands every three seconds > and gets these confirmed reliably. > - .24 gets forcibly disconnected from the network, by having its > Ethernet cable removed. > - 110 seconds expire. This is in line with a relatively long > Time2Retain, which in this setup is 90 seconds. > - 110 seconds after .24 has issued its last successful RESERVE command, > .26 apparently attempts to acquire the quorum device. > - I now see a SERVICE ACTION IN command (opcode 0x9e) from .26, with a > Service Action of Read Capacity (10). > - This fails with a Reservation Conflict (0x18) status. > > The initiator on .26 then repeats the last two actions indefinitely. It > apparently never even attempts to recover from this situation. Whatever > "bus reset" entries I am seeing in the Windows Event log, none of those > actions ever appear to actually reach the target -- I am not seeing a > renewed login attempt, nor a target reset, nor a LUN reset, nothing. > > I am also failing to understand why the MS initiator would use the > SERVICE ACTION IN detour when upon initial login it just uses standard > INQUIRY commands and READ CAPACITY. > > I have complete pcap traces from the sequence of events, both for IET > and STGT -- I can send them off-list if anyone is interested in looking > into this. > > Fujita-san, Ross, Arne -- any ideas at all? I agree with Tomo's assessment - the READ CAPACITY commands should be performed even if the LU is reserved. I merged the following patch as svn rev. 226, please let us know if it works. Thanks Arne diff --git a/kernel/target_disk.c b/kernel/target_disk.c index 24515ac..694edb2 100644 --- a/kernel/target_disk.c +++ b/kernel/target_disk.c @@ -482,8 +482,13 @@ static int disk_check_reservation(struct iscsi_cmnd *cmnd) case RELEASE: case REPORT_LUNS: case REQUEST_SENSE: + case READ_CAPACITY: /* allowed commands when reserved */ break; + case SERVICE_ACTION_IN: + if ((cmnd_hdr(cmnd)->scb[1] & 0x1F) == 0x10) + break; + /* fall through */ default: /* return reservation conflict for all others */ send_scsi_rsp(cmnd, -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html