On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote:
On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:While we're at it, I'd like to mention another issue: WWID changes.This is a big problem for multipathd. The gist is that the deviceidentification attributes in sysfs only change after rescanning thedevice. Thus if a user changes LUN assignments on a storage system,it can happen that a direct INQUIRY returns a different WWID as insysfs, which is fatal. If we plan to rely more on sysfs for deviceidentification in the future, the problem gets worse.I think many devices rely on the fact that they are identified byVendor/model/serial_nr, because in most professional SAN storagesystems youcan pre-set the serial number to a custom value; so if you want a newdisk(maybe a snapshot) to be compatible with the old one, just assign thesameserial number. I guess that's the idea behind.What you are saying sounds dangerous to me. If a snapshot has the sameWWID as the device it's a snapshot of, it must not be exposed to anyhost(s) at the same time with its origin, otherwise the host mayhappily combine it with the origin into one multipath map, and datacorruption will almost certainly result.My argument is about how the host is supposed to deal with a WWIDchange if it happens. Here, "WWID change" means that a given H:C:T:Lsuddenly exposes different device designators than it used to, whilethis device is in use by a host. Here, too, data corruption isimminent, and can happen in a blink of an eye. To avoid this, severalthings are needed:1) the host needs to get notified about the change (likely by an UA ofsome sort)2) the kernel needs to react to the notification immediately, e.g. byblocking IO to the device,3) userspace tooling such as udev or multipathd need to figure out howto how to deal with the situation cleanly, and eventually unblock it.Wrt 1), we can only hope that it's the case. But 2) and 3) need work,afaics.
In my view the WWID should never change. If a snapshot is created it should either obtain a new WWID. An example out of a Hitachi array is
Device Identification VPD page:
Addressed logical unit:
designator type: T10 vendor identification, code set: ASCII
vendor id: HITACHI
vendor specific: 50403B050709
designator type: NAA, code set: Binary
0x60060e80123b050050403b0500000709
The majority of the naa wwid is tied to the storage subsystem and identifies the vendor oui, model, serial etc. The last 4 in this example indicate the LDEV ID (Sorry mainframe heritage here..). When a snapshot is taken these 4 will change as a new LDEV ID is assigned to the snapshot. This sort of behaviour should be consistent across all storage vendors imho.
Martin
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel