Re: [PATCH 3/9] scsi: improved eh timeout handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/11/2013 01:24 AM, Jörn Engel wrote:
> On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote:
>>
>> I don't care too much whether we use per-command work items or a
>> single system-global thread.
> 
> Actually, I do care.  We have to abort the commands in parallel, as a
> fairly large number of abort can queue up and individual aborts can
> take 20s on hardware I care about.
> 
> 20s for an abort is pretty bad, but given today's reality there is no
> need to make things worse by serializing.
> 
We're only serializing aborts per LUN, so this is a _big_
improvement as the original, where we would be serializing
per _host_.
Also, upon the first abort failure EH will be escalating to
LUN reset, so we won't have to wait for all aborts to time out.

More importantly, the current synchronous implementation of
command aborts does not allow for complete de-serialisation:
- There is no way to abort a running command abort, so we
  have to wait for it to complete, with the chance of running
  into a timeout.
- We will have to sent command aborts in parallel, and can
  only stop sending aborts once the first returns an error.
- After we've received an error we have to wait for the
  outstanding aborts to complete.
-> So the max wait-time will be 2 times the abort timeout.
  Not much of a gain here :-)

The _correct_ way of handling asynchronous aborts would
be to mandate that the LLDD has to send a command completion
on the original command once an abort has been issued.
Then we could just kick off the TMF and rearm the request
timer. Everything else would then be handled via normal
I/O paths.

However, this would mean to implement new callouts into
each and every driver. And the actual gain would be
dubious, as the several IHVs indicated that a command
abort might be handled lazily, ie the target will return
a good status, but abort the command only at a later time.
Other vendors treat a command abort as a best bet, and
rely on the LUN reset to clear up things.

So overall I doubt we'd be gaining much from a fully
asynchronous command abort. I'd rather concentrate
on getting the remaining bits like LUN reset working
correctly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux