Re: [PATCH 3/9] scsi: improved eh timeout handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 June 2013 08:18:51 +0200, Hannes Reinecke wrote:
> On 06/11/2013 01:24 AM, Jörn Engel wrote:
> > On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote:
> >>
> >> I don't care too much whether we use per-command work items or a
> >> single system-global thread.
> > 
> > Actually, I do care.  We have to abort the commands in parallel, as a
> > fairly large number of abort can queue up and individual aborts can
> > take 20s on hardware I care about.
> > 
> > 20s for an abort is pretty bad, but given today's reality there is no
> > need to make things worse by serializing.
> > 
> We're only serializing aborts per LUN, so this is a _big_
> improvement as the original, where we would be serializing
> per _host_.

I agree it is a big improvement.  But I also have some evidence
indicating it may not be enough.  So let me change my private patch to
do parallel aborts and see how it fares.

> Also, upon the first abort failure EH will be escalating to
> LUN reset, so we won't have to wait for all aborts to time out.

The case I saw was a successful abort taking 20s.

> More importantly, the current synchronous implementation of
> command aborts does not allow for complete de-serialisation:
> - There is no way to abort a running command abort, so we
>   have to wait for it to complete, with the chance of running
>   into a timeout.
> - We will have to sent command aborts in parallel, and can
>   only stop sending aborts once the first returns an error.
> - After we've received an error we have to wait for the
>   outstanding aborts to complete.
> -> So the max wait-time will be 2 times the abort timeout.
>   Not much of a gain here :-)

I have seen 10 commands get queued for aborts.  Assuming the 20s above
are the worst case, it will make the difference between 200s and 40s.

Granted, 40s is still horrible.  The command in question has just
timed out and the kernel should inform userspace about this asap.  If
a command with a 10s timeout takes 50s to complete, userspace will
have to add another layer of timeouts.  That should never be
necessary.

> The _correct_ way of handling asynchronous aborts would
> be to mandate that the LLDD has to send a command completion
> on the original command once an abort has been issued.
> Then we could just kick off the TMF and rearm the request
> timer. Everything else would then be handled via normal
> I/O paths.
> 
> However, this would mean to implement new callouts into
> each and every driver. And the actual gain would be
> dubious, as the several IHVs indicated that a command
> abort might be handled lazily, ie the target will return
> a good status, but abort the command only at a later time.
> Other vendors treat a command abort as a best bet, and
> rely on the LUN reset to clear up things.
> 
> So overall I doubt we'd be gaining much from a fully
> asynchronous command abort. I'd rather concentrate
> on getting the remaining bits like LUN reset working
> correctly.

Fair approach.  Again, let me play with it and don't let this stop you
from making other improvements.

Jörn

--
There's nothing that will change someone's moral outlook quicker
than cash in large sums.
-- Larry Flynt
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux