RE: [2.4.21] Spurious ABORTs

James Bottomley <James.Bottomley@xxxxxxxxxxxx> · Tue, 27 Sep 2005 12:00:52 -0500

On Tue, 2005-09-27 at 12:39 -0400, Bagalkote, Sreenivas wrote:
> >
> >On Tue, 2005-09-27 at 12:18 -0400, Bagalkote, Sreenivas wrote:
> >> When I return SUCCESS to the spurious ABORTs, the systems keeps 
> >> running. I am getting aborts for commands that I completed 
> >as early as 
> >> 60+ seconds ago. Could somebody please tell me what in SCSI 
> >layer can 
> >> cause it to do this?
> >
> >Well, 2.4 is somewhat more eccentric than 2.6 as far as SCSI goes.
> >However, I can guess about this one.  If a command is 
> >completed after it times out, you still get error handling for 
> >it (this is actually still true in 2.6).  When the system 
> >becomes aware of a need for error handling it quiesces the 
> >driver (i.e. waits for all outstanding commands to time out or 
> >return) before beginning the eh thread.  So, if a bunch of 
> >commands are failing, you can complete one that has already 
> >timed out and still receive an ABORT for it ages afterwards.
> >
> >James
> 
> Thanks. But 60 seconds after the completion?! In any case, I don't have

the sd timeout is 30s; I can certainly construct theoretical situations
where you'd not get an abort until 60s after completion, yes.

> an abort handler in my release driver. Only reset handler. If I see that
> I don't have any pending commands with me, I simply return SUCCESS from
> the reset handler. Is this the correct way of doing this? (Returning
> FAILED would cause the controller to be marked offline).

As long as you actually do a reset, yes.  The mid-layer's next actions
will be to try a test unit ready, and if that succeeds to retry the
command.

James

-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html