Hello, My apologies if this has been inquired about before; if there is a post I overlooked in the list archives that addresses my issue, please feel free to point me to it. I am currently working on iSCSI Target & LU resource agents for the Pacemaker cluster manager. If interested, please see http://hg.linux-ha.org/dev/; the relevant resource agents are named iSCSITarget and iSCSILogicalUnit and can be found in http://hg.linux-ha.org/dev/file/tip/resources/OCF. The implementation currently supports IET and STGT, and is intended to be used in conjunction with other cluster resource types so the following sequence occurs on resource startup: - block all access to TCP port 3260 via a firewall rule; - switch a DRBD (www.drbd.org) device into the Primary role; - make available an LVM Volume Group that resides on that DRBD device; - fire up a virtual cluster IP address that initiators use to connect to the target portal; - create an iSCSI target and portal; - assign LUs to that target (these map to LVs on the DRBD-backed VG); - unblock access to TCP port 3260. On resource shutdown, the same procedure happens in reverse order. Resource migration (to the peer cluster node) in essence amounts to shutdown on node A, then startup on node B. The entire process typically completes in well under 30 seconds. Now, when failover completes, connected initiators naturally encounter a connection reset from the target daemon. The open-iSCSI initiator takes this in stride, reconnecting immediately and continuing any ongoing I/O unhampered. The Microsoft iSCSI initiator (2.08), when connected to an IET target, also reconnects immediately after target failover. From the Windows event log: Event ID 20 (from iSCSIPrt) Connection to the target was lost. The initiator will attempt to retry the connection. Event ID 34 (from iSCSIPrt) A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name. Ongoing I/O on connected devices, in this case, continues without a user-noticeable hiccup. I see the same messages when the same Microsoft iSCSI initiator is connected to an STGT target. However, and only when talking to a STGT target, I also see these (after the connection is re-established): Event ID 12 (from PlugPlayManager) The device 'IET Controller SCSI Array Device' (SCSI\Array&Ven_IET_____&Prod_Controller______&Rev_0001\1&2afd7d61&2&000000) disappeared from the system without first being prepared for removal. These are repeated for all iSCSI disks the initiator is connected to. I am also getting these: Event ID 57 (from Ftdisk) The system failed to flush data to the transaction log. Corruption may occur. In the STGT case, even though the initiator automatically reconnects to the target, any I/O on the connected target is interrupted, and the Windows box spews out positively alarming messages. Now I wonder what STGT is doing differently from IET here? Is there any specific target or LU parameter that should be set in order to avoid this issue? Any insight would be much appreciated. Thanks very much! Cheers, Florian
Attachment:
signature.asc
Description: OpenPGP digital signature