On 03/14/2017 06:33 PM, Benjamin Block wrote: > Hello Hannes, > > On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote: >> There hasn't been any reports for HBAs where asynchronous abort >> would not work, so we should make it mandatory and remove >> the fallback. >> >> Signed-off-by: Hannes Reinecke <hare@xxxxxxx> >> Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx> >> Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> >> --- >> Documentation/scsi/scsi_eh.txt | 18 ++++------ >> drivers/scsi/scsi_error.c | 81 ++++-------------------------------------- >> drivers/scsi/scsi_lib.c | 2 +- >> drivers/scsi/scsi_priv.h | 3 +- >> include/scsi/scsi_host.h | 5 --- >> 5 files changed, 15 insertions(+), 94 deletions(-) >> >> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt >> index 4edb9c1c..11e447b 100644 >> --- a/Documentation/scsi/scsi_eh.txt >> +++ b/Documentation/scsi/scsi_eh.txt >> @@ -70,7 +70,7 @@ with the command. >> scmd is requeued to blk queue. >> >> - otherwise >> - scsi_eh_scmd_add(scmd, 0) is invoked for the command. See >> + scsi_eh_scmd_add(scmd) is invoked for the command. See >> [1-3] for details of this function. >> >> >> @@ -103,9 +103,7 @@ function >> eh_timed_out() callback did not handle the command. >> Step #2 is taken. >> >> - 2. If the host supports asynchronous completion (as indicated by the >> - no_async_abort setting in the host template) scsi_abort_command() >> - is invoked to schedule an asynchrous abort. >> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort. >> Asynchronous abort are not invoked for commands which the >> SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command >> already had been aborted once, and this is a retry which failed), >> @@ -127,16 +125,13 @@ function >> >> scmds enter EH via scsi_eh_scmd_add(), which does the following. >> >> - 1. Turns on scmd->eh_eflags as requested. It's 0 for error >> - completions and SCSI_EH_CANCEL_CMD for timeouts. >> + 1. Links scmd->eh_entry to shost->eh_cmd_q >> >> - 2. Links scmd->eh_entry to shost->eh_cmd_q >> + 2. Sets SHOST_RECOVERY bit in shost->shost_state >> >> - 3. Sets SHOST_RECOVERY bit in shost->shost_state >> + 3. Increments shost->host_failed >> >> - 4. Increments shost->host_failed >> - >> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed >> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed >> >> As can be seen above, once any scmd is added to shost->eh_cmd_q, >> SHOST_RECOVERY shost_state bit is turned on. This prevents any new >> @@ -252,7 +247,6 @@ scmd->allowed. >> >> 1. Error completion / time out >> ACTION: scsi_eh_scmd_add() is invoked for scmd >> - - set scmd->eh_eflags >> - add scmd to shost->eh_cmd_q >> - set SHOST_RECOVERY >> - shost->host_failed++ >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index 802a081..7b70ee9 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) >> } >> } >> >> - scsi_eh_scmd_add(scmd, 0); >> + scsi_eh_scmd_add(scmd); >> } >> >> /** >> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) >> /** >> * scsi_eh_scmd_add - add scsi cmd to error handling. >> * @scmd: scmd to run eh on. >> - * @eh_flag: optional SCSI_EH flag. >> */ >> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) >> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd) >> { >> struct Scsi_Host *shost = scmd->device->host; >> unsigned long flags; >> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) >> if (shost->eh_deadline != -1 && !shost->last_reset) >> shost->last_reset = jiffies; >> >> - if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) >> - eh_flag &= ~SCSI_EH_CANCEL_CMD; >> - scmd->eh_eflags |= eh_flag; >> scsi_eh_reset(scmd); >> list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); >> shost->host_failed++; >> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) >> rtn = host->hostt->eh_timed_out(scmd); >> >> if (rtn == BLK_EH_NOT_HANDLED) { >> - if (host->hostt->no_async_abort || >> - scsi_abort_command(scmd) != SUCCESS) { >> + if (scsi_abort_command(scmd) != SUCCESS) { >> set_host_byte(scmd, DID_TIME_OUT); >> - scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); >> + scsi_eh_scmd_add(scmd); >> } >> } >> >> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost, >> list_for_each_entry(scmd, work_q, eh_entry) { >> if (scmd->device == sdev) { >> ++total_failures; >> - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) >> + if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) >> ++cmd_cancel; >> else >> ++cmd_failed; >> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q, >> * should not get sense. >> */ >> list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >> - if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || >> - (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || >> + if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || >> SCSI_SENSE_VALID(scmd)) >> continue; >> >> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list, >> return list_empty(work_q); >> } >> >> - >> -/** >> - * scsi_eh_abort_cmds - abort pending commands. >> - * @work_q: &list_head for pending commands. >> - * @done_q: &list_head for processed commands. >> - * >> - * Decription: >> - * Try and see whether or not it makes sense to try and abort the >> - * running command. This only works out to be the case if we have one >> - * command that has timed out. If the command simply failed, it makes >> - * no sense to try and abort the command, since as far as the shost >> - * adapter is concerned, it isn't running. >> - */ >> -static int scsi_eh_abort_cmds(struct list_head *work_q, >> - struct list_head *done_q) >> -{ >> - struct scsi_cmnd *scmd, *next; >> - LIST_HEAD(check_list); >> - int rtn; >> - struct Scsi_Host *shost; >> - >> - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >> - if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD)) >> - continue; >> - shost = scmd->device->host; >> - if (scsi_host_eh_past_deadline(shost)) { >> - list_splice_init(&check_list, work_q); >> - SCSI_LOG_ERROR_RECOVERY(3, >> - scmd_printk(KERN_INFO, scmd, >> - "%s: skip aborting cmd, past eh deadline\n", >> - current->comm)); >> - return list_empty(work_q); >> - } >> - SCSI_LOG_ERROR_RECOVERY(3, >> - scmd_printk(KERN_INFO, scmd, >> - "%s: aborting cmd\n", current->comm)); >> - rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); >> - if (rtn == FAILED) { >> - SCSI_LOG_ERROR_RECOVERY(3, >> - scmd_printk(KERN_INFO, scmd, >> - "%s: aborting cmd failed\n", >> - current->comm)); >> - list_splice_init(&check_list, work_q); >> - return list_empty(work_q); >> - } >> - scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; >> - if (rtn == FAST_IO_FAIL) >> - scsi_eh_finish_cmd(scmd, done_q); >> - else >> - list_move_tail(&scmd->eh_entry, &check_list); >> - } >> - >> - return scsi_eh_test_devices(&check_list, work_q, done_q, 0); >> -} >> - >> /** >> * scsi_eh_try_stu - Send START_UNIT to device. >> * @scmd: &scsi_cmnd to send START_UNIT >> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q, >> sdev_printk(KERN_INFO, scmd->device, "Device offlined - " >> "not ready after error recovery\n"); >> scsi_device_set_state(scmd->device, SDEV_OFFLINE); >> - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) { >> - /* >> - * FIXME: Handle lost cmds. >> - */ >> - } >> scsi_eh_finish_cmd(scmd, done_q); >> } >> return; >> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost) >> SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); >> >> if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) >> - if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) >> - scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); >> + scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); >> >> spin_lock_irqsave(shost->host_lock, flags); >> if (shost->eh_deadline != -1) >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index 0735a46..98b2df8 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq) >> scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); >> break; >> default: >> - scsi_eh_scmd_add(cmd, 0); >> + scsi_eh_scmd_add(cmd); >> break; >> } >> } >> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h >> index 5be6cbf6..e20ab10 100644 >> --- a/drivers/scsi/scsi_priv.h >> +++ b/drivers/scsi/scsi_priv.h >> @@ -18,7 +18,6 @@ >> /* >> * Scsi Error Handler Flags >> */ >> -#define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */ >> #define SCSI_EH_ABORT_SCHEDULED 0x0002 /* Abort has been scheduled */ >> >> #define SCSI_SENSE_VALID(scmd) \ >> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor, >> extern int scsi_error_handler(void *host); >> extern int scsi_decide_disposition(struct scsi_cmnd *cmd); >> extern void scsi_eh_wakeup(struct Scsi_Host *shost); >> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); >> +extern void scsi_eh_scmd_add(struct scsi_cmnd *); >> void scsi_eh_ready_devs(struct Scsi_Host *shost, >> struct list_head *work_q, >> struct list_head *done_q); >> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h >> index 3cd8c3b..afb0481 100644 >> --- a/include/scsi/scsi_host.h >> +++ b/include/scsi/scsi_host.h >> @@ -452,11 +452,6 @@ struct scsi_host_template { >> unsigned no_write_same:1; >> >> /* >> - * True if asynchronous aborts are not supported >> - */ >> - unsigned no_async_abort:1; >> - >> - /* >> * Countdown for host blocking with no commands outstanding. >> */ >> unsigned int max_host_blocked; >> -- >> 1.8.5.6 >> > > Hmm so, I guess we compromise in terms of how granular we want to > recover? > > When say an abort for command A on LUN 1 behind Port α fails for some > reason, then we also skip all possible aborts for command B on LUN 2 > behind Port α and command C on LUN 1 behind Port β? (The host might > already be in recovery by the time command B and C fail) > No. Recovery will only be started once all commands have been completed or aborted. Hence will be be aborting all commands before entering SCSI EH. > I mean, in the end, all traffic holds on that host anyway, as long as we > get into EH - which is true as soon as one abort fails. Might as well > skip the wait time till we get there. > There still is a benefit to be had from sending individual aborts; after all, the commands are supposed to be independently, and sending an abort serves as a pointer to the LLDD to remove any reference to it. > I would like it, if we could document that fact a bit more direct. > Otherwise I think that's good. The code also looks good. > > > Reviewed-by: Benjamin Block <bblock@xxxxxxxxxxxxxxxxxx> > Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)