Re: [PATCH] SCSI: handle HARDWARE_ERROR sense correctly

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 8 Dec 2008 10:10:40 -0500 (EST)

On Thu, 4 Dec 2008, Mike Anderson wrote:

> > > The current behaviour is to retry the error until the command timeout
> > > expires, which, I think is what was needed by the annoying arrays that
> > > have retryable hardware errors.
> > > 
> 
> Yes there are some arrays that need this behavior. The two users: 
> usb disks and the devinfo entries with BLIST_RETRY_HWERROR appear to have
> two different expected behaviors.

This leads me to question why hardware errors aren't always retried as 
a matter of course?

Of course, in most cases it makes sense to retry only a few times.  
(In other words, don't do five retries per second for 60 seconds!)  
Tape arrays needing indefinite retries appear to be out of the
ordinary.

> > For example, does it really make sense for scsi_softirq_done
> > to multiply cmd->allowed by rq->timeout?  After all, if a command
> > aborts with a timeout instead of failing outright, what point is there
> > in retrying it?  The proper approach would have been to use a longer 
> > timeout initially.
> > 
> 
> The wait_for is used for more than retries of timeouts.

What else is it used for?  In my copy of scsi_softirq_done() it appears
in just one place, and that is a test to see whether the command should
fail with a timeout error.

> I had thought it might be a good idea to expose the wait_for value and
> then users could control the wait_for behavior if needed by using a udev
> rule to set it near the IO timeout value if so required.

Why should the wait_for value be any different from the regular I/O 
timeout?  Isn't it in fact a timeout value?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html