dm-multipath "shaky SAN detection" is insufficient for intermittent errors.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

This topic may have been discussed before although I've not been able to find it in this d-list.

The "shaky SAN" detection method seems to be currently based on availability of the remote target ports and how often they disappear/reappear as per HBA state change on that remote target.

What we seen in SAN troubleshooting is that the majority of issues are related to frame-corruption, missing frames and therefore incomplete FC sequences/exchanges and hence just IO errors. I've been doing some test with a FC jammer/analyser doing all sorts of weird things from changing a scsi data-payload or crc therefore corrupting the frame to almost persistently killing of cmnds or status frames based on normal IO's. As long as I do nothing on a TUR and that checker keeps getting correct statuses back it's then just left to the FC stack and or arrays to chuck and HBA offline forcing multipath to halt IO's to that path.

My request is would it be possible to, instead (or in addition) of checking on disappearing/re-appearing targets, to monitor for actual IO errors on data-transfer where cmnd's timeout or cmnd's end up in any check condition and utilise that to either halt IO's entirely or to also use the marginal_path_err logic and have that path moved into a holding queue and in the background check for subsequent errors where the marginal_err_sample_time, err_rate_threshold and gap_time then determine if that path can be used again or to have it permanently failed.

I've been doing SAN troubleshooting for 20 years and the majority of the problems is related to these intermittent issues of frame corruption and/or discards. Actual flipping paths where a target goes into and offline/online state is far less common. 

Your feedback is appreciated.

Thank you

Erwin van Londen
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux