On 03/15/13 13:37, Bryn M. Reeves wrote:
On 03/15/2013 12:24 PM, Bart Van Assche wrote:
On 03/15/13 12:55, Hannes Reinecke wrote:
And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.
Hello Hannes,
I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.
Most of the cases I've seen have involved lpfc (although I don't think
it's in any way exclusive to that LLDD). Even with very low
fast_io_fail_timeout/dev_loss_timeout (<5/10) the eh is busy for 10m or
longer before IO fails and multipath is able to react to the problem.
The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for
outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html