Re: NAA breakage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nicholas,

Mmmm, I think the right solution here would be ignoring the extra '-'
characters here at the point that the vpd_unit_serial attribute is set
via configfs..  However, this would still obviously still cause an issue
of the NAA WWN changing..

I think the following points should be solved:

(a) How many existing production setups can be affected in the same way as my lab cluster? My setup is quite special because I run LIO on top of active/passive DRBD, generate my own serials to maintain LUN identities across DRBD nodes, access configfs plane directly using my own library instead of rtsadmin/lio-utils etc. I can easily change the serial number generator because we don't use LIO in production yet, but it does not solve the problem for others.

(b) Are there any restrictions for vpd_unit_serial format in T10 specifications? Now, afaik configfs allows me to set an arbitrary string...

(c) If there are no restrictions for the serial number format, NAA should be probably generated using a hash function (e.g. SHA) instead of hex2bin. The present implementation can easily produce identical NAAs for two different serial numbers which is really bad.

(d) IMHO this issue should be solved during this mainline release, because the growing number of LIO target users will make future fixes harder.

How severe is the breakage with VMWare here when the NAA WWN changes..?
Does this require a logout ->  relogin from the perspective of the ESX
client..?  Or does this cause issues with on-disk metadata for VMFS that
references existing NAA WWNs..?

Well, first of all, I'm not a VMware expert. Based upon my tests and research in last two days, this is a serious headache for VMware ESX(i). ESX >=3.5 uses NAA identifier as a guaranteed unique signature of a physical volume and saves a copy of NAA to VMFS header. When establishing a storage session, on-disk signatures of VMFS extents are compared with the actual NAAs presented by the storage to avoid data corruption, maintain multiple paths to a single volume etc.

In practice, when I changed NAA of an active VMFS volume with running VMs, it resulted in an unrecoverable error (see kb.vmware.com/kb/1003416):

"ALERT: NMP: vmk_NmpVerifyPathUID: The physical media represented by device naa.600140535a4c2c4daa90dd591dc453dd (path vmhba34:C0:T0:L8) has changed. If this is a data LUN, this is a critical error."

I didn't test NAA change of an inactive unmounted VMFS volume, but I expect that VMware will treat such a volume as a storage snapshot and its resignature will be needed. See kb.vmware.com/kb/1011387 or http://holyhandgrenade.org/blog/2010/07/practical-vmfs-signatures/ blogpost.

In all cases, nontrivial effort is probably necessary to make it work again. It seems to me that the easiest solution (and the only solution without downtime) is to migrate all VMs to another shared storage using Storage vMotion, destroy the VMFS volume, change NAA, recreate VMFS and migrate VMs back. (But if somebody else know an easy way to restore active VMFS volume after NAA change, please tell me :-))

Martin

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux