Re: fc_remote_port_delete and returning SCSI commands from LLD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christof Schmitt wrote:
On Wed, Oct 21, 2009 at 01:11:15PM -0500, Mike Christie wrote:
Christof Schmitt wrote:
If the remote_port status is not BLOCKED, this will trigger the SCSI
midlayer error handling which cannot do much during the interruption
to the hardware and will mark the SCSI devices 'offline'. In order to
prevent this, the rule would be: First call fc_remote_port_delete to
set the remote port (or in the case of an HBA interruption all remote
ports) to BLOCKED, and only after this step call scsi_done to pass the
SCSI commands back to the upper layers.

One other note when doing this.

For problems where you are deleting the rport, it is best to use something like DID_TRANSPORT_DISRUPTED to fail the cmd if you are failing it right away.

"something like DID_TRANSPORT_DISRUPTED" would be any error code that
goes through "maybe_retry" in scsi_decide_disposition? I guess moving
to DID_TRANSPORT_DISRUPTED is nice for consistency, but DID_ERROR
triggers the same code paths as far as i can see.

It could be a little different. See scsi_noretry_cmd. If you used DID_ERROR and something set the driver failfast bit then it would be fast failed.



If drivers block the rport, then fail commands immediately with DID_TRANSPORT_DISRUPTED, then they will not actually be failed to the block/mpath layer until the fast io fail timeout has fired. This will prevent very short problems from firing the mutlipath path offlining code.

Just to get the complete picture: Blocking the rport and then
returning DID_TRANSPORT_DISRUPTED will retry the command to the LLD
which then first calls fc_remote_port_chkready.
fc_remote_port_chkready will then keep the command between LLD and
SCSI midlayer until the rport state changes or the fast_fail fires.
Is this the complete picture or did i miss something?

I think that is it.


If your driver deletes the rport and does not fail the cmd immediately so it can recover within the command or some other reason like the fw just works that way, then when the fast io fail timer fires and the terminate_rport_io callback is run you could actually use any error code since at this time when a IO is sent to the queuecommand the driver will call fc_remote_port_chkready and IO will be failed immediately with DID_TRANSPORT_FAILFAST).

And the rport state is still BLOCKED, so at this point commands failed
in the upper layers with blk_abort_request will not end up in the SCSI
error recovery which cannot do much...

Thanks for the help, i am starting to get the complete picture...

Christof

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux