On 17.06.2013 09:29, Bart Van Assche wrote: > On 06/17/13 09:14, Hannes Reinecke wrote: >> On 06/17/2013 09:04 AM, Bart Van Assche wrote: >>> I agree that the value of fast_io_fail_tmo should be kept small. >>> Although as you explained changing the SCSI device state into >>> SDEV_BLOCK doesn't help for I/O that has already been queued on a >>> failed path, I think it's still useful for I/O that is queued after >>> the fast_io_fail timer has been started and before that timer has >>> expired. >> >> Why, but of course. >> >> The typical scenario would be: >> -> detect link-loss >> -> call scsi_block_request() >> -> start dev_loss_tmo and fast_io_fail_tmo >> >> -> When fast_io_fail_tmo triggers: >> -> Abort all outstanding requests >> >> -> When dev_loss_tmo triggers: >> -> Abort all outstanding requests >> -> Remove/disable the I_T nexus >> -> call scsi_unblock_request() >> >> However, if and whether multipath detects SDEV_BLOCK doesn't >> guarantee a fast failover; in fact is was only added rather recently >> as it's not a big win in most cases. > > Even if setting the state SDEV_BLOCK doesn't help much with improving > failover time, it still has the advantage over using > scsi_block_requests() that it can be overridden by a user via sysfs. In my opinion that SDEV_BLOCK can help the reconnect. The only reason for high fast_io_fail_tmo is that you don't use multipath at all and hope that the connection becomes available again before that timeout. You place the reconnects in between so that there is a chance that the reconnect succeeds and the transport layer error work can be canceled. But I have to look at all of your patches first to see how you implemented the big picture. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html