On Tue, 2013-06-11 at 16:41 -0400, Ewan Milne wrote: > On Tue, 2013-06-11 at 18:57 +0000, James Bottomley wrote: > > On Mon, 2013-06-10 at 01:20 -0700, Christoph Hellwig wrote: > > > On Mon, Jun 10, 2013 at 09:40:52AM +0200, Hannes Reinecke wrote: > > > > When a command runs into a timeout we need to send an 'ABORT TASK' > > > > TMF. This is typically done by the 'eh_abort_handler' LLDD callback. > > > > > > > > Conceptually, however, this function is a normal SCSI command, so > > > > there is no need to enter the error handler. > > > > > > > > This patch implements a new scsi_abort_command() function which > > > > invokes an asynchronous function scsi_eh_abort_handler() to > > > > abort the commands via 'eh_abort_handler'. > > > > > > > > If the 'eh_abort_handler' returns SUCCESS or FAST_IO_FAIL the > > > > command will be retried if possible. If no retries are allowed > > > > the command will be returned immediately, as we have to assume > > > > the TMF succeeded and the command is completed with the LLDD. > > > > If the TMF fails the command will be pushed back onto the > > > > list of failed commands and the SCSI EH handler will be > > > > called immediately for all timed-out commands. > > > > > > Why can't we use a work item per command? Linking things into a list > > > just to queue it up to workqueues missed half of the point of the > > > workqueue infrastructure. > > > > Actually, I think we can dump the workqueue altogether. The only reason > > we need it is because the current abort handlers wait for the command > > and return the completion state. However, all LLDs are capable of > > emitting TMFs at interrupt level, so if we separated the emit from the > > wait, we could simply do this sequence: > > > > on timeout, fire the abort from interrupt and mark the command as having > > an abort issued (possibly by adding a pointer to the abort task), return > > BLK_EH_RESET_TIMER. > > Doesn't this cause blk_rq_timed_out to reset the timer on the req to > the original timeout value again? It seems like this would increase > the time before any further attempted error handling. The default > timeout is 30 seconds for sd, but it could be much longer (e.g. > WRITE SAME, which was 120 seconds last I looked). It currently does, but that's fixable via a special return code. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html