On Fri, 2008-12-05 at 16:41 +0200, Kai Makisara wrote: > On Thu, 4 Dec 2008, James Bottomley wrote: > > > On Thu, 2008-12-04 at 15:49 -0500, Alan Stern wrote: > > > This patch (as1183) fixes a bug in scsi_check_sense(). The routine is > > > documented as returning one of SUCCESS, FAILED, or NEEDS_RETRY. But > > > in the HARDWARE_ERROR case it can return ADD_TO_MLQUEUE. And since it > > > does this without bothering to increment the retry count, it can lead > > > to an infinite retry loop. > > > > > > The fix is to return NEEDS_RETRY instead. Then the caller, > > > scsi_decide_disposition(), will do the right thing. > > > > OK, but why? > > > > The current behaviour is to retry the error until the command timeout > > expires, which, I think is what was needed by the annoying arrays that > > have retryable hardware errors. > > > So, a tape command returning (non-recoverable) HARDWARE_ERROR is retried > until the timeout (default 3.8 hours if the command happens to use the > long timout)? And is the result returned to the upper level timeout > instead of sense data? Does not sound good. No. This is abnormal behaviour and it's conditioned on a flag in device info. The standards say that HARDWARE_ERROR is an immediate failure ... we just have some stupid arrays (won't name names) that violate the standard and the option was either to give the user spurious I/O errors or allow retry. > And another thing is that retrying an error that is not clearly retryable > "outside" retry counting does not sound good. It's not by standard HARDWARE_ERROR is never retryable, so we don't in the usual case. > > What bug would this patch fix? Because I can see it causing problems > > with the arrays that originally reported this problem. > > > Is a quirk needed? BLIST_RETRY_HWERROR James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html