Re: Problems with multipathing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Roger Håkansson a écrit :
Also, I've noticed that it's not only when a controller fails that this
happens, when a failed controller is "revived" the same thing might happen.

As far as I've been able to tell, the more I/O-transactions at the time
of the failure, the more likely that the (SCSI) device will be marked as
"dead".

Hmmm.. I'm wondering if he's hitting the scenario in which the midlayer
marks the sdev in an offline state - which could be the "dead" state.
This occurs if an i/o hits the LLDD when the device is disconnected, and
error recovery fails. If so, at a later time when the LLDD has connectivity
and can access the device, the scsi layer would still likely bounce i/o.
It requires a manual interaction to change it back to a running state,
any i/o requests by dm would be failed back by the midlayer.

What doesn't jive is the rescan re-enabling the device. As I stated, this
is usually a manual action to restore things. If the rescans are just
prior to the transition to the offline state, they may be making dm change
it's path mappings to avoid i/o to the failed path, thus deflecting the
sdev transition.  Can you report the contents of
/sys/class/scsi_device/1:0:*/device/state  at the following states in both
the works and does not work cases :
  working, right after failover but before dm fails it; after failure/success

-- james

--

dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux