Re: RFC: one more time: SCSI device identification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote:
On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:


While we're at it, I'd like to mention another issue: WWID changes.

This is a big problem for multipathd. The gist is that the device
identification attributes in sysfs only change after rescanning the
device. Thus if a user changes LUN assignments on a storage system,
it can happen that a direct INQUIRY returns a different WWID as in
sysfs, which is fatal. If we plan to rely more on sysfs for device
identification in the future, the problem gets worse.

I think many devices rely on the fact that they are identified by
Vendor/model/serial_nr, because in most professional SAN storage
systems you
can pre-set the serial number to a custom value; so if you want a new
disk
(maybe a snapshot) to be compatible with the old one, just assign the
same
serial number. I guess that's the idea behind.

What you are saying sounds dangerous to me. If a snapshot has the same
WWID as the device it's a snapshot of, it must not be exposed to any
host(s) at the same time with its origin, otherwise the host may
happily combine it with the origin into one multipath map, and data
corruption will almost certainly result.

My argument is about how the host is supposed to deal with a WWID
change if it happens. Here, "WWID change" means that a given H:C:T:L
suddenly exposes different device designators than it used to, while
this device is in use by a host. Here, too, data corruption is
imminent, and can happen in a blink of an eye. To avoid this, several
things are needed:

 1) the host needs to get notified about the change (likely by an UA of
some sort)
 2) the kernel needs to react to the notification immediately, e.g. by
blocking IO to the device,
 3) userspace tooling such as udev or multipathd need to figure out how
to  how to deal with the situation cleanly, and eventually unblock it.

Wrt 1), we can only hope that it's the case. But 2) and 3) need work,
afaics.

In my view the WWID should never change. If a snapshot is created it should either obtain a new WWID. An example out of a Hitachi array is

Device Identification VPD page:
Addressed logical unit:
designator type: T10 vendor identification, code set: ASCII
vendor id: HITACHI
vendor specific: 50403B050709
designator type: NAA, code set: Binary
0x60060e80123b050050403b0500000709

The majority of the naa wwid is tied to the storage subsystem and identifies the vendor oui, model, serial etc. The last 4 in this example indicate the LDEV ID (Sorry mainframe heritage here..). When a snapshot is taken these 4 will change as a new LDEV ID is assigned to the snapshot. This sort of behaviour should be consistent across all storage vendors imho.

Martin

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux