RE: Scsi Error handling query

Kashyap Desai <kashyap.desai@xxxxxxxxxxxxx> · Fri, 27 Mar 2015 00:13:02 +0530

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@xxxxxxx]
> Sent: Thursday, March 26, 2015 9:28 PM
> To: Kashyap Desai; linux-scsi@xxxxxxxxxxxxxxx
> Subject: Re: Scsi Error handling query
>
> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
> > Hi Hannes,
> >
> > I was going through one of the slide posted at below link.
> >
> > http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pd
> > f
> >
> > Slide #59 has below data. I was trying to correlate with latest
> > upstream code, but do not understand few things. Does Linux handle
> > blocking I/O to the device and target before it actually start legacy EH
> recovery ?
>
> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command to
> the
> internal 'eh_entry' list and starts recovery once all remaining
> outstanding
> commands are completed.

Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so it
means  blocking further IO to the Host and not really a limited to
Device/Target for which command was timed out. Right ?
I understood that, new improvement of scsi error handling will allow IOs to
the other Devices attached to the host except the IO belongs to specific
target.

Also, one more thing to clarify... In presentation, term "task set aborts"
was used. Does this mean task set abort is handled as traversing complete
list of timed out command and sending individual TASK ABORT ?

>
> > Also, how does linux scsi stack achieve task set abort ?
> >
> Currently we don't :-)
> The presentation was a roadmap about future EH updates.
>
> > Proposed SCSI EH strategy
> > • Send command aborts after timeout
> > • EH Recovery starts:
> > ‒ Block I/O to the device
> >        ‒ Issue 'Task Set Abort'
> > ‒ Block I/O to the target
> >        ‒ Issue I_T Nexus Reset
> >        ‒ Complete outstanding command on success ‒ Engage current EH
> > strategy
> >        ‒ LUN Reset, Target Reset etc
> >
> The current plans for EH updates are:
>
> - Convert eh_host_reset_handler() to take Scsi_Host as argument
>   - Convert EH host reset to do a host rescan after try_host_reset()
>     succeeded
>   - Terminate failed scmds prior to calling try_host_reset()
>   => with that we should be able to instantiate a quick failover
>      when running under multipathing, as then I/Os will be returned
>      prior to the host reset (which is know to take quite a long
>      time)
>
> - Convert the remaining eh_XXX_reset_handler() to take the
>   appropriate structure as argument.
>   This will require some work, as some EH handler implementation
>   re-use the command tag (or even the actual command) for sending
>   TMFs.
>
> - Implementing a 'transport reset' EH function; to be called
>   after the current EH LUN Reset
>
> - Investigating the possibilty for an asynchronous 'task set abort',
>   and make the 'transport reset' EH function asynchronous, too.
>
> I've got a patchset for the first step, but the others still require some
> work ...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@xxxxxxx			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG
> Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html