On Wed, Oct 21, 2009 at 01:11:15PM -0500, Mike Christie wrote: > Christof Schmitt wrote: >> If the remote_port status is not BLOCKED, this will trigger the SCSI >> midlayer error handling which cannot do much during the interruption >> to the hardware and will mark the SCSI devices 'offline'. In order to >> prevent this, the rule would be: First call fc_remote_port_delete to >> set the remote port (or in the case of an HBA interruption all remote >> ports) to BLOCKED, and only after this step call scsi_done to pass the >> SCSI commands back to the upper layers. >> > > One other note when doing this. > > For problems where you are deleting the rport, it is best to use > something like DID_TRANSPORT_DISRUPTED to fail the cmd if you are > failing it right away. "something like DID_TRANSPORT_DISRUPTED" would be any error code that goes through "maybe_retry" in scsi_decide_disposition? I guess moving to DID_TRANSPORT_DISRUPTED is nice for consistency, but DID_ERROR triggers the same code paths as far as i can see. > If drivers block the rport, then fail commands > immediately with DID_TRANSPORT_DISRUPTED, then they will not actually be > failed to the block/mpath layer until the fast io fail timeout has > fired. This will prevent very short problems from firing the mutlipath > path offlining code. Just to get the complete picture: Blocking the rport and then returning DID_TRANSPORT_DISRUPTED will retry the command to the LLD which then first calls fc_remote_port_chkready. fc_remote_port_chkready will then keep the command between LLD and SCSI midlayer until the rport state changes or the fast_fail fires. Is this the complete picture or did i miss something? > If your driver deletes the rport and does not fail the cmd immediately > so it can recover within the command or some other reason like the fw > just works that way, then when the fast io fail timer fires and the > terminate_rport_io callback is run you could actually use any error code > since at this time when a IO is sent to the queuecommand the driver will > call fc_remote_port_chkready and IO will be failed immediately with > DID_TRANSPORT_FAILFAST). And the rport state is still BLOCKED, so at this point commands failed in the upper layers with blk_abort_request will not end up in the SCSI error recovery which cannot do much... Thanks for the help, i am starting to get the complete picture... Christof -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html