On Thu, 2015-12-10 at 01:55 +0000, Sam McLeod wrote: > >>>The Company ID, VSI, and VSIE are generated by LIO based upon the > >>> current vpd_unit_serial configfs attribute value. > >>> > >>> So as long as vpd_unit_serial is persistent, and the same value for > >>> backend devices across export failover to different nodes, Xen will > >>> always see the same EVPD information. > >>> > >>> Are you saying that vpd_unit_serial is already persistent across export > >>> failover, but Xen is still having problems..? > >>> > >>> Have you confirmed with sg_inq -i both before and after the export > >>> failover occurs..? > > Hi Nicholas, > > Sorry for how long it's taken me to reply but I wanted to let you (and > the mailing list) know this is this resolved with great thanks to your > explanation of how the vpd_unit_serial works in relation to the SCSI > ID. > > Once we enforced the vpd_unit_serial on each of the LUNs we can > consistently fail over between iSCSI servers without the SCSI ID > changing. > > For reference for those using Pacemaker + Corosync with the LIO > target: > > > primitive iscsi_lun_r1 iSCSILogicalUnit \ > op monitor timeout=10s interval=30s on-fail=restart \ > op start timeout=20s interval=0 on-fail=restart \ > op stop timeout=20s interval=0 on-fail=restart \ > params > target_iqn="iqn.2003-01.org.linux-iscsi.s1-san5.x8664:sn.cb568058d955" > scsi_sn=bff3f42a-49d8-4cfc-b64e-2b933e98141d lun=1 path="/dev/drbd1" > allowed_initiators="iqn.2015-05.com.example:516c8f8c > iqn.2015-06.com.example:2dcd27e0 iqn.2013-09.com.example:e611b8f2 > iqn.2013-11.com.example:aef3bcea iqn.2015-06.com.example:3577646c > iqn.2015-05.com.example:3367ed85 iqn.2015-07.com.example:0467ccce > iqn.2015-11.com.example:40ee457b" implementation=lio-t > > Note the scsi_sn parameter being passed in, this is what enforces the > vpd_unit_serial as per > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/iSCSILogicalUnit#L367 > > Such a simple fix to something that for a long time we thought was > unrelated. > I plan to write a quick blog post up on this as there are a lot of > other people having this issue with Xen and it's clearly quite easy to > fix when you understand the relationship as you pointed out. (Adding Florian + JXM CC') Thanks for following up on your original post. Yes, this default resource-agent behavior has caused endless amounts of confusion to end-users over the years. It's difficult to imagine a case where vpd_unit_serial persistence should not be happening during LIO backend + export fail-over between cluster nodes. Or at least, there should be a giant warning or something. That said, I have no idea who is maintaining the HA resource-agents stuff these days, but it would certainly be a good idea to add this bug here: https://github.com/ClusterLabs/resource-agents/issues Would you be so kind to articulate this bug on github, and what you've done beyond the defaults in order to have a working setup..? Thank you, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html