On Tue, 2014-06-10 at 14:56 +0200, Hannes Reinecke wrote: > On 06/10/2014 11:11 AM, Chris Boot wrote: > > On 06/06/14 07:34, Nicholas A. Bellinger wrote: > >> Hi Chris & Phillip, > >> > >> On Thu, 2014-06-05 at 15:31 +0100, Chris Boot wrote: > >>> Hi folks, > >>> > >>> Background: I'm working on creating a Highly Available iSCSI system > >>> using Pacemaker with some collaborators. We looked at the existing > >>> iSCSILogicalUnit and iSCSITarget resource scripts, and they didn't seem > >>> to do quite what we wanted so we have started down the route of writing > >>> our own. Our new scripts are GPL and their current incarnations are > >>> available at https://github.com/tigercomputing/ocf-lio > >>> > >>> In general terms the setup is reasonably simple: we have a DRBD volume > >>> running in dual-primary mode, which is then used to create an iblock > >>> device, which itself is exported over iSCSI. We have been attempting to > >>> use ALUA multipathing in implicit mode only to manage target failover. > >>> > >>> We create two ALUA TPGs on each node, call them east/west, and mark one > >>> as Active/Optimised and the other as Active/NonOptimised. When we create > >>> the iSCSI TPGs on both nodes, one node's TPG is placed in the west ALUA > >>> TPG and the other node's is placed into the east ALUA TPG. > >>> > >>> When simulating a fail-over, the ALUA states on both east and west are > >>> changed on both nodes and kept synchronised. > >>> > >>> What we see when using multipathd on Linux as the initiator all appears > >>> to work well until we switch roles on the target. multipathd seems to > >>> stick to the old path, even though it is now NonOptimised and running > >>> slowly due to the 100ms nonop_delay_msecs. > >>> > >>> If, instead, we set the standby path to Standby mode rather than > >>> Active/NonOptimised, multipathd correctly notices the path is > >>> unavailable and sends IO over the Active/Optimised path. However, if the > >>> initiator originally logs-in to the target while the path is in Standby > >>> mode, it fails to correctly probe the device. When it becomes > >>> Active/Optimised during failover, multipathd is unable to use it and > >>> fails the path. The TUR checker returns that the path is active, though, > >>> and makes the path active again, only to be failed again etc... The only > >>> way to bring it back to life is to "echo 1 > > >>> /sys/block/$DEV/device/rescan" and re-run multipath by hand. > >>> > >>> I haven't been able to test this myself, but Philip (CCed) reports that > >>> similar behaviour is seen using VMware as the initiator rather than Linux. > >>> > >>> Has anyone managed to set up an ALUA multipath HA SAN with two nodes and > >>> LIO? What are we missing? Am I going to have to throw in the towel on > >>> ALUA and just use virtual IP failover instead? > >> > >> After testing this evening with similar config on a single target > >> instance, the issue where initial LUN probe failures occur on a ALUA > >> group set implicitly to Standby state is reproducible.. > >> > >> The failure occurs during the initial READ_CAPACITY, which is currently > >> disallowed in opcode checking within core_alua_state_standby() code. I > >> thought at one point READ_CAPACITY could fail during initial LUN probe > >> and still bring up a struct scsi_device with a zero number of sectors, > >> but could be wrong..? (Hannes CC'ed) > >> > >> In any event, the following patch to permit READ_CAPACITY addresses the > >> initial LUN probe failure and works on my end, and should allow implicit > >> ALUA Active/* <-> Standby + vice versa state change to function now. > >> > >> Please confirm with your setup. > > > > Hi Nab, > > > > Ack, this fixes the issue completely for me under Linux with multipathd. > > The standby path is correctly probed now when you login to the target, > > and when you fail-over to it everything carries on. Thanks very much! > > > > Note that I tested on 3.14 so had to replace set_ascq() with *alua_ascq > > as you did for the stable patches. > > > > Tested-by: Chris Boot <crb@xxxxxxxxxxxxxxxxxxxxx> > > > > FWIW, the kernel messages we obtain when probing the disk look like: > > > > [ 388.929254] scsi12 : iSCSI Initiator over TCP/IP > > [ 389.184537] scsi 12:0:0:0: Direct-Access LIO-ORG test1 > > 4.0 PQ: 0 ANSI: 5 > > [ 389.184632] scsi 12:0:0:0: alua: supports implicit TPGS > > [ 389.185229] scsi 12:0:0:0: alua: port group 11 rel port 01 > > [ 389.185390] scsi 12:0:0:0: alua: port group 11 state S non-preferred > > supports TOlUSNA > > [ 389.185393] scsi 12:0:0:0: alua: Attached > > [ 389.185791] sd 12:0:0:0: Attached scsi generic sg5 type 0 > > [ 389.186499] sd 12:0:0:0: [sde] 2147418040 512-byte logical blocks: > > (1.09 TB/1023 GiB) > > [ 389.187254] sd 12:0:0:0: [sde] Write Protect is off > > [ 389.187258] sd 12:0:0:0: [sde] Mode Sense: 43 00 10 08 > > [ 389.188301] sd 12:0:0:0: [sde] Write cache: enabled, read cache: > > enabled, supports DPO and FUA > > [ 389.190677] ldm_validate_partition_table(): Disk read failed. > > [ 389.190705] Dev sde: unable to read RDB block 0 > > [ 389.190734] sde: unable to read partition table > > [ 389.192784] sd 12:0:0:0: [sde] Attached SCSI disk > > [ 389.246318] sd 10:0:0:0: alua: port group 10 state A preferred > > supports TOlUSNA > > [ 389.325327] sd 10:0:0:0: alua: port group 10 state A preferred > > supports TOlUSNA > > > > Thanks for getting a patch to us so quickly, and sorry it took so long > > to get it tested. > > > >> Thanks! > >> > >> --nab > >> > >> diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c > >> index fcbe612..63512cc 100644 > >> --- a/drivers/target/target_core_alua.c > >> +++ b/drivers/target/target_core_alua.c > >> @@ -576,7 +576,16 @@ static inline int core_alua_state_standby( > >> case REPORT_LUNS: > >> case RECEIVE_DIAGNOSTIC: > >> case SEND_DIAGNOSTIC: > >> + case READ_CAPACITY: > >> return 0; > >> + case SERVICE_ACTION_IN: > >> + switch (cdb[1] & 0x1f) { > >> + case SAI_READ_CAPACITY_16: > >> + return 0; > >> + default: > >> + set_ascq(cmd, ASCQ_04H_ALUA_TG_PT_STANDBY); > >> + return 1; > >> + } > >> case MAINTENANCE_IN: > >> switch (cdb[1] & 0x1f) { > >> case MI_REPORT_TARGET_PGS: > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe target-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > Hmm. While I agree with the patch (and confirm that it's required to get > multipath working on LIO-target), it really seems that multipath is > making incorrect assumptions here. > Looking at the spec READ_CAPACITY is indeed not required to be supported > for STANDBY paths, so multipath will fail for any ALUA implementations > following the spec more closely. > > Guess we need to discuss this on dm-devel ... > <nod>, at least for Standby state, the spec gives a bit more leeway here: "The device server may support other commands." At least for READ_CAPACITY, Chris + Phillip reported that ESX is expecting this to work in order to probe LUNs in Standby as well.. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html