Re: fc_remote_port_delete and returning SCSI commands from LLD

Christof Schmitt <christof.schmitt@xxxxxxxxxx> · Fri, 23 Oct 2009 09:13:24 +0200

On Wed, Oct 21, 2009 at 01:11:15PM -0500, Mike Christie wrote:
> Christof Schmitt wrote:
>> If the remote_port status is not BLOCKED, this will trigger the SCSI
>> midlayer error handling which cannot do much during the interruption
>> to the hardware and will mark the SCSI devices 'offline'. In order to
>> prevent this, the rule would be: First call fc_remote_port_delete to
>> set the remote port (or in the case of an HBA interruption all remote
>> ports) to BLOCKED, and only after this step call scsi_done to pass the
>> SCSI commands back to the upper layers.
>>
>
> One other note when doing this.
>
> For problems where you are deleting the rport, it is best to use  
> something like DID_TRANSPORT_DISRUPTED to fail the cmd if you are  
> failing it right away.

"something like DID_TRANSPORT_DISRUPTED" would be any error code that
goes through "maybe_retry" in scsi_decide_disposition? I guess moving
to DID_TRANSPORT_DISRUPTED is nice for consistency, but DID_ERROR
triggers the same code paths as far as i can see.

> If drivers block the rport, then fail commands  
> immediately with DID_TRANSPORT_DISRUPTED, then they will not actually be  
> failed to the block/mpath layer until the fast io fail timeout has  
> fired. This will prevent very short problems from firing the mutlipath  
> path offlining code.

Just to get the complete picture: Blocking the rport and then
returning DID_TRANSPORT_DISRUPTED will retry the command to the LLD
which then first calls fc_remote_port_chkready.
fc_remote_port_chkready will then keep the command between LLD and
SCSI midlayer until the rport state changes or the fast_fail fires.
Is this the complete picture or did i miss something?

> If your driver deletes the rport and does not fail the cmd immediately  
> so it can recover within the command or some other reason like the fw  
> just works that way, then when the fast io fail timer fires and the  
> terminate_rport_io callback is run you could actually use any error code  
> since at this time when a IO is sent to the queuecommand the driver will  
> call fc_remote_port_chkready and IO will be failed immediately with  
> DID_TRANSPORT_FAILFAST).

And the rport state is still BLOCKED, so at this point commands failed
in the upper layers with blk_abort_request will not end up in the SCSI
error recovery which cannot do much...

Thanks for the help, i am starting to get the complete picture...

Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html