Hello Fred, Your feedback is very useful, but please note that in my e-mail I used the phrase "transport layer" to refer to the code in the Linux kernel in which the fast_io_fail_tmo functionality has been implemented. The following commit message from 10 years ago explains why the fast_io_fail_tmo and dev_loss_tmo mechanisms have been implemented: --------------------------------------------------------------------------- commit 0f29b966d60e9a4f5ecff9f3832257b38aea4f13 Author: James Smart <James.Smart@xxxxxxxxxx> Date: Fri Aug 18 17:33:29 2006 -0400 [SCSI] FC transport: Add dev_loss_tmo callbacks, and new fast_io_fail_tmo w/ callback This patch adds the following functionality to the FC transport: - dev_loss_tmo LLDD callback : Called to essentially confirm the deletion of an rport. Thus, it is called whenever the dev_loss_tmo fires, or when the rport is deleted due to other circumstances (module unload, etc). It is expected that the callback will initiate the termination of any outstanding i/o on the rport. - fast_io_fail_tmo and LLD callback: There are some cases where it may take a long while to truly determine device loss, but the system is in a multipathing configuration that if the i/o was failed quickly (faster than dev_loss_tmo), it could be redirected to a different path and completed sooner. Many thanks to Mike Reed who cleaned up the initial RFC in support of this post. --------------------------------------------------------------------------- Bart. On 04/28/2016 09:19 AM, Knight, Frederick wrote: > There are multiple possible situations being intermixed in this discussion. > First, I assume you're talking only about random access devices (if you try > transport level error recover on a sequential access device - tape or SMR > disk - there are lots of additional complexities). > > Failures can occur at multiple places: > a) Transport layer failures that the transport layer is able to detect quickly; > b) SCSI device layer failures that the transport layer never even knows about. > > For (a) there are two competing goals. If a port drops off the fabric and > comes back again, should you be able to just recover and continue. But how > long do you wait during that drop? Some devices use this technique to "move" > a WWPN from one place to another. The port drops from the fabric, and a > short time later, shows up again (the WWPN moves from one physical port to a > different physical port). There are FC driver layer timers that define the > length of time allowed for this operation. The goal is fast failover, but > not too fast - because too fast will break this kind of "transparent failover". > This timer also allows for the "OH crap, I pulled the wrong cable - put it > back in; quick" kind of stupid user bug. > > For (b) the transport never has a failure. A LUN (or a group of LUNs) > have an ALUA transition from one set of ports to a different set of ports. > Some of the LUNs on the port continue to work just fine, but others enter > ALUA TRANSITION state so they can "move" to a different part of the hardware. > After the move completes, you now have different sets of optimized and > non-optimized paths (or possible standby, or unavailable). The transport > will never even know this happened. This kind of "failure" is handled by > the SCSI layer drivers. > > There are other cases too, but these are the most common. > > Fred > > -----Original Message----- > From: lsf-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx [mailto:lsf-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Bart Van Assche > Sent: Thursday, April 28, 2016 11:54 AM > To: James Bottomley; Mike Snitzer > Cc: linux-block@xxxxxxxxxxxxxxx; lsf@xxxxxxxxxxxxxxxxxxxxxxxxxx; device-mapper development; linux-scsi > Subject: Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM > > On 04/28/2016 08:40 AM, James Bottomley wrote: >> Well, the entire room, that's vendors, users and implementors >> complained that path failover takes far too long. I think in their >> minds this is enough substance to go on. > > The only complaints I heard about path failover taking too long came > from people working on FC drivers. Aren't SCSI transport layer > implementations expected to fail I/O after fast_io_fail_tmo expired > instead of waiting until the SCSI error handler has finished? If so, why > is it considered an issue that error handling for the FC protocol can > take very long (hours)? > > Thanks, > > Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html