On Tue, 2021-04-27 at 10:21 +0200, Hannes Reinecke wrote:
On 4/27/21 10:10 AM, Martin Wilck wrote:On Tue, 2021-04-27 at 13:48 +1000, Erwin van Londen wrote:Wrt 1), we can only hope that it's the case. But 2) and 3) need work,afaics.In my view the WWID should never change.In an ideal world, perhaps not. But in the dm-multipath realm, we knowthat WWID changes can happen with certain storage arrays. Seeand follow-ups, for example.And it's actually something which might happen quite easily.The storage array can unmap a LUN, delete it, create a new one, and mapthat one into the same LUN number than the old one.If we didn't do I/O during that interval upon the next I/O we will begetting the dreaded 'Power-On/Reset' sense code._And nothing else_, due to the arcane rules for sense code generation inSAM.But we end up with a completely different device.The only way out of it is to do a rescan for every POR sense code, anddisable the device eg via DID_NO_CONNECT whenever we find that theidentification has changed. We already have a copy of the original VPDpage 0x83 at hand, so that should be reasonably easy.
The way out of this is to chuck the array in the bin. As I mentioned in one of my other emails when a scenario happens as you described above and the array does not inform the initiator it goes against the SAM-5 standard.
That standard shows:
5.14 Unit attention conditions
5.14.1 Unit attention conditions that are not coalesced
Each logical unit shall establish a unit attention condition whenever one of the following events occurs:
a) a power on (see 6.3.1), hard reset (see 6.3.2), logical unit reset (see 6.3.3), I_T nexus loss (see 6.3.4), or power loss expected (see 6.3.5) occurs;
b) commands received on this I_T nexus have been cleared by a command or a task management function associated with another I_T nexus and the TAS bit was set to zero in the Control mode page associated with this I_T nexus (see 5.6);
c) the portion of the logical unit inventory that consists of administrative logical units and hierarchical logical units has been changed (see 4.6.18.1); or
d) any other event requiring the attention of the SCSI initiator device.
Especially the I_T nexus loss under a is an important trigger.
---
6.3.4 I_T nexus loss
An I_T nexus loss is a SCSI device condition resulting from:
a) a hard reset condition (see 6.3.2);
b) an I_T nexus loss event (e.g., logout) indicated by a Nexus Loss event notification (see 6.4);
c) indication that an I_T NEXUS RESET task management request (see 7.6) has been processed; or
d) an indication that a REMOVE I_T NEXUS command (see SPC-4) has been processed.
An I_T nexus loss event is an indication from the SCSI transport protocol to the SAL that an I_T nexus no
longer exists. SCSI transport protocols may define I_T nexus loss events.
Each SCSI transport protocol standard that defines I_T nexus loss events should specify when those events
result in the delivery of a Nexus Loss event notification to the SAL.
The I_T nexus loss condition applies to both SCSI initiator devices and SCSI target devices.
If a SCSI target port detects an I_T nexus loss, then a Nexus Loss event notification shall be delivered to
each logical unit to which the I_T nexus has access.
In response to an I_T nexus loss condition a logical unit shall take the following actions:
a) abort all commands received on the I_T nexus as described in 5.6;
b) abort all background third-party copy operations (see SPC-4) that are using the I_T nexus;
c) terminate all task management functions received on the I_T nexus;
d) clear all ACA conditions (see 5.9.5) associated with the I_T nexus;
e) establish a unit attention condition for the SCSI initiator port associated with the I_T nexus (see 5.14
and 6.2); and
f) perform any additional functions required by the applicable command standards.
---
This does also mean that any underlying transport protocol issues like on FC or TCP for iSCSI will very often trigger aborted commands or UA's as well which will be picked up by the kernel/respected drivers.
I had a rather lengthy discussion with Fred Knight @ NetApp aboutPower-On/Reset handling, what with him complaining that we don't handleis correctly. So this really is something we should be looking into,even independently of multipathing.But actually I like the idea from Martin Petersen to expose the parsedVPD identifiers to sysfs; that would allow us to drop sg_inq completelyfrom the udev rules.Cheers,Hannes
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel