On 05/10/2013 03:24 PM, Hannes Reinecke wrote:
However, this time is only defined _on the initiator_. The specification does _NOT_ have any fixed timeout values for _any_ command. As such it could in theory (and does, if you happen to run against certain arrays under certain conditions) take several minutes to return a completion.
That's my understanding too - in a multipath configuration we're waiting only for our own fast_io_fail_tmo (if set), which is essentially an arbitrary, administrator-controlled interval. You can tune it between extremes of rapid fault identification vs. paths twitching at every transient glitch.
Yes, that was the idea. Which I'll get down to eventually; if only customers wouldn't have all these obnoxious issues no-one has ever seen...
The class I've been looking at is really very easy to reproduce and we've seen it at least a half dozen times at different sites with different FC switches (so it's certainly not that unusual).
To recreate it artificially you just need a target, a host, and a switch that can block RSCN propagation on a per-port basis. I've been using brocades with the rscnsupr portcfg attribute.
It's important that you block a port on the switch<->target side otherwise the host will see a link event which short-circuits everything.
E.g. if you have one port of an array attached to port 1 on a brocade the following two commands will set up this scenario:
portcfg rscnsupr 1 --enable portdisable 1 Regards, Bryn. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html