On Fri, 2018-12-21 at 10:06 -0600, Benjamin Marzinski wrote: > > I've been thinking about how we handle marginal paths, and it seems > to > me that instead of telling the kernel that they have failed, it might > be > better to create pathgroups of last resort, which contains marginal > paths that should only be used if all the other paths are down. Maybe we should simply assign marginal paths a very low priority? At least with "group_by_prio" and immediate failback, that would cause multipathd to switch to these paths if nothing else is available, and switch back ASAP - so it would give you the desired behavior almost at no cost. An open question for me is whether this priority should be higher or lower than what we assign to "ghost" paths ins standby state (1, currently). Side note: the global "failback" policy setting may not fit the needs of all modern setups. I think that immediate failback is always correct for "marginal" vs. flawless paths, but we know that it's not always wanted for non-optimal vs. optimal paths, or other failback scenarios. > > The downsides to this method are that it is quite possible that it > could > double the number of pathgroups whenever you have connection issues, > since a connection issue near the host HBA could cause a marginal > path > in each pathgroup. This means more reloading tables, and more > confusing > layouts. > > The upside to this method is that multipath won't run out of paths > while > their are still marginal paths that it could use. When queuing isn't > enabled, there's nothing to stop the kernel from failing IO while > potentially usable marginal paths exist. > > On the other hand, this problem could be mitigated by having > multipath > work such that, when marginal path detection is configured, it always > makes sure that no_path_retry is at least some minimum value that we > believe is long enough for multipathd to be notified of the path > failure > by the kernel and to reinstate the marginal paths. I'd rather simply document that we discourage "no_path_retry = fail" while marginall path detection is enabled. "long enough" sounds like a can of worms to me. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel