On 2005-04-22T12:13:53, Lan <transter@xxxxxxxxx> wrote: > Although, it seems need to add to multipath-tools the ability to set > a timeout limit on how long an I/O is queued and retried (otherwise in > a permanent failure, I think the I/O could be queued for a quite > awhile, e.g. until system runs out of memory). This can actually be implemented in user-space. If the paths stay down for N seconds, remove the queue_if_no_path feature flag, and all IO will be failed. > Also, what do you think about allowing a configurable threshold on I/O > failures in dm-multipath before deciding to set a path dead; 1 is > kinda low, and has no tolerance at all for transient errors. That might be a good idea. Note however that DM mpath already distinguishes between path failures and media failures for example: A media failure will not cause a path to be failed. And there's also a trade-off: As long as the path is not failed, it'll receive more IO. Which, if it doesn't turn out to be a transient error, we will need to wait on to fail, has to be requeued and retried somewhere else. This causes delays. Failing the path on the first error potentially attributable to the transport will cause an immediate retry on another path though; and if it turns out to be a transient error, the path will be returned into operation within a couple of seconds by user-space. > I think it will lessen the dependency on waiting for multipath-tools > to reinstate a path that has been set dead due to a transient > condition. True, but this is actually by current design, because we want to redirect IO to healthy paths as quickly as possible. Sincerely, Lars Marowsky-Brée <lmb@xxxxxxx> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html