Re: [RFC] fc transport: extensions for fast fail and dev loss

Michael Reed <mdr@xxxxxxx> · Tue, 25 Jul 2006 16:15:13 -0500

James Smart wrote:
> 
> 
> Mike Christie wrote:
>>
>> So is the fast_io_fail_tmo callback the terminate_rport_io callback?
> 
> Yes.
> When fast_io_fail_tmo expires, it calls the terminate_rport_io() callback.
> 
>> If
>> so, are we supposed to unblock the rport/session/target from
>> fc_timeout_fail_rport_io
> 
> No... don't unblock.
> 
>> and call into the LLD and the LLD will set some
>> bit (or maybe check some rport/session/target/scsi_device bit) so that
>> incoming IO and IO sitting in the driver will be failed with something
>> like DID_BUS_BUSY so it goes to the upper layers?
> 
> The way this is managed in the fc transport is - the LLD calls the
> transport when it establishes connectivity (an "add" call), and when it
> loses connectivity (a "delete" call). When the transport receives the
> delete call, it changes the rport state, blocks the rport, and starts the
> dev_loss timeout (and potentially the fast_io_fail_tmo if < dev_loss).
> If the LLD makes the add call prior to dev_loss expiring, then updates
> the state, and unblocks the rport.  If dev_loss expires, it updates
> state again (essentially the true deleted state) and tears down the
> target tree.
> 
> To deal with requests being received while blocked, etc - the LLD's use
> a helper routine (fc_remote_port_chkready()), which validates the rport
> state, and if not valid (e.g. blocked or removed) returns the appropriate
> status to return to the midlayer. If blocked, it returns DID_IMM_RETRY.
> If deleted, it returns DID_NO_CONNECT.
> 
> What the above never dealt with was the i/o already in the driver. The
> driver always had the option to terminate the active i/o when the loss
> of connectivity occurred, or it could just wait for it to timeout, etc
> and be killed that way. This patch added the callback at dev_loss_tmo
> to guarantee i/o is killed, and added the fast_io_fail_tmo if you
> wanted a faster guarantee. If fast_io_fail_tmo expires and the callback
> is called - it just kills the outstanding i/o and does nothing to the
> rport's blocked state.

Haven't most drivers / board firmware generally cleaned up any outstanding
i/o at the time (or shortly thereafter) of the fc_remote_port_delete()
call?  I would think it reasonable to just require that the driver clean
up the i/o after calling fc_remote_port_delete().  Is there a significant
reason to keep the i/o alive in the driver?  The rport has just been
deleted....  Would this eliminate the need for the callback?  If the
driver implements this, could it just have a NULL callback routine?

Mike

> 
>> I think I only the
>> unblock happen on success or fc_starget_delete, so IO in the driver
>> looks like it can get failed upwards but IO sitting in the queue sits
>> there until fc_rport_final_delete or success.
> 
> Yeah - essentially this is correct. I hope the above read that way.
> I'm also hoping the iSER folks are reading this to get the general
> feel of what's happening with block, dev_loss, etc.
> 
>>
>> If that is correct, what about a new device state? When the fail fast
>> tmo expires we can set the device to the new state, run the queue and
>> incoming IO or IO in the request_queue marked with FAILFAST can be
>> failed upwards by scsi-ml.
>>
>> I just woke up though :)
> 
> Sounds reasonable. It is adding a new semantic to what was meant by
> fast_fail - but it's in line with our goal.  The goal was to terminate
> i/o so that they could be quick rescheduled on a different path rather
> than wait (what may be a long time) for the dev_loss connectivity
> timer to fire.  Makes sense you would want to make new i/o requests
> bound by that same window.
> 
> -- james s
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html