James Smart wrote: > > > Mike Christie wrote: >> >> So is the fast_io_fail_tmo callback the terminate_rport_io callback? > > Yes. > When fast_io_fail_tmo expires, it calls the terminate_rport_io() callback. > >> If >> so, are we supposed to unblock the rport/session/target from >> fc_timeout_fail_rport_io > > No... don't unblock. > >> and call into the LLD and the LLD will set some >> bit (or maybe check some rport/session/target/scsi_device bit) so that >> incoming IO and IO sitting in the driver will be failed with something >> like DID_BUS_BUSY so it goes to the upper layers? > > The way this is managed in the fc transport is - the LLD calls the > transport when it establishes connectivity (an "add" call), and when it > loses connectivity (a "delete" call). When the transport receives the > delete call, it changes the rport state, blocks the rport, and starts the > dev_loss timeout (and potentially the fast_io_fail_tmo if < dev_loss). > If the LLD makes the add call prior to dev_loss expiring, then updates > the state, and unblocks the rport. If dev_loss expires, it updates > state again (essentially the true deleted state) and tears down the > target tree. > > To deal with requests being received while blocked, etc - the LLD's use > a helper routine (fc_remote_port_chkready()), which validates the rport > state, and if not valid (e.g. blocked or removed) returns the appropriate > status to return to the midlayer. If blocked, it returns DID_IMM_RETRY. > If deleted, it returns DID_NO_CONNECT. > > What the above never dealt with was the i/o already in the driver. The > driver always had the option to terminate the active i/o when the loss > of connectivity occurred, or it could just wait for it to timeout, etc > and be killed that way. This patch added the callback at dev_loss_tmo > to guarantee i/o is killed, and added the fast_io_fail_tmo if you > wanted a faster guarantee. If fast_io_fail_tmo expires and the callback > is called - it just kills the outstanding i/o and does nothing to the > rport's blocked state. Haven't most drivers / board firmware generally cleaned up any outstanding i/o at the time (or shortly thereafter) of the fc_remote_port_delete() call? I would think it reasonable to just require that the driver clean up the i/o after calling fc_remote_port_delete(). Is there a significant reason to keep the i/o alive in the driver? The rport has just been deleted.... Would this eliminate the need for the callback? If the driver implements this, could it just have a NULL callback routine? Mike > >> I think I only the >> unblock happen on success or fc_starget_delete, so IO in the driver >> looks like it can get failed upwards but IO sitting in the queue sits >> there until fc_rport_final_delete or success. > > Yeah - essentially this is correct. I hope the above read that way. > I'm also hoping the iSER folks are reading this to get the general > feel of what's happening with block, dev_loss, etc. > >> >> If that is correct, what about a new device state? When the fail fast >> tmo expires we can set the device to the new state, run the queue and >> incoming IO or IO in the request_queue marked with FAILFAST can be >> failed upwards by scsi-ml. >> >> I just woke up though :) > > Sounds reasonable. It is adding a new semantic to what was meant by > fast_fail - but it's in line with our goal. The goal was to terminate > i/o so that they could be quick rescheduled on a different path rather > than wait (what may be a long time) for the dev_loss connectivity > timer to fire. Makes sense you would want to make new i/o requests > bound by that same window. > > -- james s > - > : send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html