Re: [PATCH] scsi: Allow error handling timeout to be specified

"Bryn M. Reeves" <bmr@xxxxxxxxxx> · Fri, 10 May 2013 15:31:29 +0100

On 05/10/2013 03:24 PM, Hannes Reinecke wrote:
However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take several
minutes to return a completion.

That's my understanding too - in a multipath configuration we're 
waiting only for our own fast_io_fail_tmo (if set), which is essentially 
an arbitrary, administrator-controlled interval. You can tune it between 
extremes of rapid fault identification vs. paths twitching at every 
transient glitch.

Yes, that was the idea.
Which I'll get down to eventually; if only customers wouldn't have
all these obnoxious issues no-one has ever seen...

The class I've been looking at is really very easy to reproduce and 
we've seen it at least a half dozen times at different sites with 
different FC switches (so it's certainly not that unusual).

To recreate it artificially you just need a target, a host, and a switch 
that can block RSCN propagation on a per-port basis. I've been using 
brocades with the rscnsupr portcfg attribute.

It's important that you block a port on the switch<->target side 
otherwise the host will see a link event which short-circuits everything.

E.g. if you have one port of an array attached to port 1 on a brocade 
the following two commands will set up this scenario:

portcfg rscnsupr 1 --enable
portdisable 1

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html