>>> Erwin van Londen <erwin@xxxxxxxxxxxxxxxxxx> schrieb am 27.04.2021 um 05:48 in Nachricht <b5f288fb43bc79e0206794a901aef5b1761813de.camel@xxxxxxxxxxxxxxxxxx>: > > On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote: >> On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote: >> > > > >> > > >> > > While we're at it, I'd like to mention another issue: WWID >> > > changes. >> > > >> > > This is a big problem for multipathd. The gist is that the device >> > > identification attributes in sysfs only change after rescanning >> > > the >> > > device. Thus if a user changes LUN assignments on a storage >> > > system, >> > > it can happen that a direct INQUIRY returns a different WWID as >> > > in >> > > sysfs, which is fatal. If we plan to rely more on sysfs for >> > > device >> > > identification in the future, the problem gets worse. >> > >> > I think many devices rely on the fact that they are identified by >> > Vendor/model/serial_nr, because in most professional SAN storage >> > systems you >> > can pre-set the serial number to a custom value; so if you want a >> > new >> > disk >> > (maybe a snapshot) to be compatible with the old one, just assign >> > the >> > same >> > serial number. I guess that's the idea behind. >> >> What you are saying sounds dangerous to me. If a snapshot has the >> same >> WWID as the device it's a snapshot of, it must not be exposed to any >> host(s) at the same time with its origin, otherwise the host may >> happily combine it with the origin into one multipath map, and data >> corruption will almost certainly result. >> >> My argument is about how the host is supposed to deal with a WWID >> change if it happens. Here, "WWID change" means that a given H:C:T:L >> suddenly exposes different device designators than it used to, while >> this device is in use by a host. Here, too, data corruption is >> imminent, and can happen in a blink of an eye. To avoid this, several >> things are needed: >> >> 1) the host needs to get notified about the change (likely by an UA >> of >> some sort) >> 2) the kernel needs to react to the notification immediately, e.g. >> by >> blocking IO to the device, >> 3) userspace tooling such as udev or multipathd need to figure out >> how >> to how to deal with the situation cleanly, and eventually unblock >> it. >> >> Wrt 1), we can only hope that it's the case. But 2) and 3) need work, >> afaics. >> > In my view the WWID should never change. If a snapshot is created it > should either obtain a new WWID. An example out of a Hitachi array is > > Device Identification VPD page: > Addressed logical unit: > designator type: T10 vendor identification, code set: ASCII > vendor id: HITACHI > vendor specific: 50403B050709 > designator type: NAA, code set: Binary > 0x60060e80123b050050403b0500000709 > > The majority of the naa wwid is tied to the storage subsystem and > identifies the vendor oui, model, serial etc. The last 4 in this > example indicate the LDEV ID (Sorry mainframe heritage here..). When a > snapshot is taken these 4 will change as a new LDEV ID is assigned to > the snapshot. This sort of behaviour should be consistent across all > storage vendors imho. It's getting off-topic, but in automatic desaster recovery scenarios one might want that the "new disk" (maybe a snapshot of the original disk before it got corrupted) looks like the "old disk", so that the OS can boot without needing any adjustments. Regards, Ulrich > >> Martin >>