Re: eh_abort_handler implementations

Mike Christie <michaelc@xxxxxxxxxxx> · Sun, 23 Jun 2013 15:59:14 -0500

On 06/12/2013 05:28 AM, Hannes Reinecke wrote:
> Hi all,
> 
> as you might know, I'm trying to revamp the eh_abort_handler
> implementation by sending command aborts directly whenever
> the timeout triggers, without entering SCSI EH.
> 
> So, during testing where the remote port is disabled I've seen this:
> 
> [  864.734937] qla2xxx [0000:41:00.0]-8802:1: Aborting from RISC
> nexus=1:0:0 sp=ffff880225b0dd40 cmd=ffff8802248d76c0
> [  864.737274] qla2xxx [0000:41:00.0]-1800:1: Entered
> qla2x00_mailbox_command.
> [  864.738720] qla2xxx [0000:41:00.0]-1806:1: Prepare to issue mbox
> cmd=0x54.
> [  864.740268] qla2xxx [0000:41:00.0]-180f:1: Going to unlock irq &
> waiting for interrupts. jiffies=100022781.
> [  864.740574] qla2xxx [0000:41:00.0]-1814:1: Cmd=54 completed.
> [  864.740596] qla2xxx [0000:41:00.0]-3822:1: FCP command status:
> 0x5-0x0 (0x80000) nexus=1:0:0 portid=691400 oxid=0x38e
> cdb=28000000000000000800 len=0x1000 rsp_info=0x0 resid=0x0 fw_resid=0x0.
> [  864.740608] qla2xxx [0000:41:00.0]-1821:1: Done
> qla2x00_mailbox_command.
> [  864.740615] qla2xxx [0000:41:00.0]-8804:1: Abort command mbx
> success cmd=ffff8802248d76c0.
> [  864.740631] qla2xxx [0000:41:00.0]-801c:1: Abort command issued
> nexus=1:0:0 --  2002.
> 
> Again, the port is disabled, so the TMF _cannot_ be received by the
> remote port, let alone processed.
> But still the command abort is processed correctly and the command
> is returned to the upper layers.
> So with the current thinking the command abort was successful, and
> EH would exit, as the remote port was assumed to be working.
> But most evidently the remote port is _still_ not reachable, so the
> TMF _should_ have returned 'FAILED'.
> At least that's what we expect.
> But it looks as if this expectation is slightly skewed, as most
> likely a successful ABORT TASK TMF just means that the command was
> terminated, not that the remote port itself was working.
> 
> If _that_ should be the case it looks as if we _always_ should be
> issuing a RESET LUN TMF whenever command aborts have been processed.
> Would that be correct?
> 

I am not sure if I understand the question. For the iscsi drivers, when
the port is down we will return failed from the abort and lun reset
handler handler. The eh will then escalate and in the target reset
handler we will then wait for a successful reconnection/relogin or for
the replacement/recovery (like the dev_loss or fast io fail) to fire.

For iscsi at least, there is no need to send a lun reset if we are doing
session level recovery (the relogin/reconnection process). It would be
nice to just have a eh return code so the LLD/iscsi layer can just tell
the scsi eh to just skip some steps.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html