On 06/17/2013 09:04 AM, Bart Van Assche wrote: > On 06/17/13 08:18, Hannes Reinecke wrote: >> On 06/15/2013 11:52 AM, Bart Van Assche wrote: [ .. ] >>> >>> I think the advantage of multipathd recognizing the SDEV_BLOCK state >>> before the fast_io_fail_tmo timer has expired is important. >>> Multipathd does not queue I/O to paths that are in the SDEV_BLOCK >>> state so setting that state helps I/O to fail over more quickly, >>> especially for large values of fast_io_fail_tmo. >>> >> Sadly it doesn't work that way. >> >> SDEV_BLOCK will instruct multipath to not queue _new_ I/Os to the >> path, but there still will be I/O queued on that path. >> For these multipath _has_ to wait for I/O completion. >> And as it turns out, in most cases the application itself will wait >> for completion on these I/O before continue sending more I/O. >> So in effect multipath would queue new I/O to other paths, but won't >> _receive_ new I/O as the upper layers are still waiting for >> completion of the queued I/O. >> >> The only way to excite fast failover with multipathing is to set >> fast_io_fail to a _LOW_ value (eg 5 seconds), as this will terminate >> the outstanding I/Os. >> >> Large values of fast_io_fail will almost guarantee sluggish I/O >> failover. > > Hello Hannes, > > I agree that the value of fast_io_fail_tmo should be kept small. > Although as you explained changing the SCSI device state into > SDEV_BLOCK doesn't help for I/O that has already been queued on a > failed path, I think it's still useful for I/O that is queued after > the fast_io_fail timer has been started and before that timer has > expired. > Why, but of course. The typical scenario would be: -> detect link-loss -> call scsi_block_request() -> start dev_loss_tmo and fast_io_fail_tmo -> When fast_io_fail_tmo triggers: -> Abort all outstanding requests -> When dev_loss_tmo triggers: -> Abort all outstanding requests -> Remove/disable the I_T nexus -> call scsi_unblock_request() However, if and whether multipath detects SDEV_BLOCK doesn't guarantee a fast failover; in fact is was only added rather recently as it's not a big win in most cases. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html