Hello Hannes, On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote: > There hasn't been any reports for HBAs where asynchronous abort > would not work, so we should make it mandatory and remove > the fallback. > > Signed-off-by: Hannes Reinecke <hare@xxxxxxx> > Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx> > Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > --- > Documentation/scsi/scsi_eh.txt | 18 ++++------ > drivers/scsi/scsi_error.c | 81 ++++-------------------------------------- > drivers/scsi/scsi_lib.c | 2 +- > drivers/scsi/scsi_priv.h | 3 +- > include/scsi/scsi_host.h | 5 --- > 5 files changed, 15 insertions(+), 94 deletions(-) > > diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt > index 4edb9c1c..11e447b 100644 > --- a/Documentation/scsi/scsi_eh.txt > +++ b/Documentation/scsi/scsi_eh.txt > @@ -70,7 +70,7 @@ with the command. > scmd is requeued to blk queue. > > - otherwise > - scsi_eh_scmd_add(scmd, 0) is invoked for the command. See > + scsi_eh_scmd_add(scmd) is invoked for the command. See > [1-3] for details of this function. > > > @@ -103,9 +103,7 @@ function > eh_timed_out() callback did not handle the command. > Step #2 is taken. > > - 2. If the host supports asynchronous completion (as indicated by the > - no_async_abort setting in the host template) scsi_abort_command() > - is invoked to schedule an asynchrous abort. > + 2. scsi_abort_command() is invoked to schedule an asynchrous abort. > Asynchronous abort are not invoked for commands which the > SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command > already had been aborted once, and this is a retry which failed), > @@ -127,16 +125,13 @@ function > > scmds enter EH via scsi_eh_scmd_add(), which does the following. > > - 1. Turns on scmd->eh_eflags as requested. It's 0 for error > - completions and SCSI_EH_CANCEL_CMD for timeouts. > + 1. Links scmd->eh_entry to shost->eh_cmd_q > > - 2. Links scmd->eh_entry to shost->eh_cmd_q > + 2. Sets SHOST_RECOVERY bit in shost->shost_state > > - 3. Sets SHOST_RECOVERY bit in shost->shost_state > + 3. Increments shost->host_failed > > - 4. Increments shost->host_failed > - > - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed > + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed > > As can be seen above, once any scmd is added to shost->eh_cmd_q, > SHOST_RECOVERY shost_state bit is turned on. This prevents any new > @@ -252,7 +247,6 @@ scmd->allowed. > > 1. Error completion / time out > ACTION: scsi_eh_scmd_add() is invoked for scmd > - - set scmd->eh_eflags > - add scmd to shost->eh_cmd_q > - set SHOST_RECOVERY > - shost->host_failed++ > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 802a081..7b70ee9 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) > } > } > > - scsi_eh_scmd_add(scmd, 0); > + scsi_eh_scmd_add(scmd); > } > > /** > @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) > /** > * scsi_eh_scmd_add - add scsi cmd to error handling. > * @scmd: scmd to run eh on. > - * @eh_flag: optional SCSI_EH flag. > */ > -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > +void scsi_eh_scmd_add(struct scsi_cmnd *scmd) > { > struct Scsi_Host *shost = scmd->device->host; > unsigned long flags; > @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies; > > - if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > - eh_flag &= ~SCSI_EH_CANCEL_CMD; > - scmd->eh_eflags |= eh_flag; > scsi_eh_reset(scmd); > list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); > shost->host_failed++; > @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) > rtn = host->hostt->eh_timed_out(scmd); > > if (rtn == BLK_EH_NOT_HANDLED) { > - if (host->hostt->no_async_abort || > - scsi_abort_command(scmd) != SUCCESS) { > + if (scsi_abort_command(scmd) != SUCCESS) { > set_host_byte(scmd, DID_TIME_OUT); > - scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); > + scsi_eh_scmd_add(scmd); > } > } > > @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost, > list_for_each_entry(scmd, work_q, eh_entry) { > if (scmd->device == sdev) { > ++total_failures; > - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) > + if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > ++cmd_cancel; > else > ++cmd_failed; > @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q, > * should not get sense. > */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > - if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || > - (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > + if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > SCSI_SENSE_VALID(scmd)) > continue; > > @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list, > return list_empty(work_q); > } > > - > -/** > - * scsi_eh_abort_cmds - abort pending commands. > - * @work_q: &list_head for pending commands. > - * @done_q: &list_head for processed commands. > - * > - * Decription: > - * Try and see whether or not it makes sense to try and abort the > - * running command. This only works out to be the case if we have one > - * command that has timed out. If the command simply failed, it makes > - * no sense to try and abort the command, since as far as the shost > - * adapter is concerned, it isn't running. > - */ > -static int scsi_eh_abort_cmds(struct list_head *work_q, > - struct list_head *done_q) > -{ > - struct scsi_cmnd *scmd, *next; > - LIST_HEAD(check_list); > - int rtn; > - struct Scsi_Host *shost; > - > - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > - if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD)) > - continue; > - shost = scmd->device->host; > - if (scsi_host_eh_past_deadline(shost)) { > - list_splice_init(&check_list, work_q); > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: skip aborting cmd, past eh deadline\n", > - current->comm)); > - return list_empty(work_q); > - } > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: aborting cmd\n", current->comm)); > - rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); > - if (rtn == FAILED) { > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: aborting cmd failed\n", > - current->comm)); > - list_splice_init(&check_list, work_q); > - return list_empty(work_q); > - } > - scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; > - if (rtn == FAST_IO_FAIL) > - scsi_eh_finish_cmd(scmd, done_q); > - else > - list_move_tail(&scmd->eh_entry, &check_list); > - } > - > - return scsi_eh_test_devices(&check_list, work_q, done_q, 0); > -} > - > /** > * scsi_eh_try_stu - Send START_UNIT to device. > * @scmd: &scsi_cmnd to send START_UNIT > @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q, > sdev_printk(KERN_INFO, scmd->device, "Device offlined - " > "not ready after error recovery\n"); > scsi_device_set_state(scmd->device, SDEV_OFFLINE); > - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) { > - /* > - * FIXME: Handle lost cmds. > - */ > - } > scsi_eh_finish_cmd(scmd, done_q); > } > return; > @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost) > SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); > > if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) > - if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) > - scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); > + scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); > > spin_lock_irqsave(shost->host_lock, flags); > if (shost->eh_deadline != -1) > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 0735a46..98b2df8 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq) > scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); > break; > default: > - scsi_eh_scmd_add(cmd, 0); > + scsi_eh_scmd_add(cmd); > break; > } > } > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 5be6cbf6..e20ab10 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -18,7 +18,6 @@ > /* > * Scsi Error Handler Flags > */ > -#define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */ > #define SCSI_EH_ABORT_SCHEDULED 0x0002 /* Abort has been scheduled */ > > #define SCSI_SENSE_VALID(scmd) \ > @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor, > extern int scsi_error_handler(void *host); > extern int scsi_decide_disposition(struct scsi_cmnd *cmd); > extern void scsi_eh_wakeup(struct Scsi_Host *shost); > -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); > +extern void scsi_eh_scmd_add(struct scsi_cmnd *); > void scsi_eh_ready_devs(struct Scsi_Host *shost, > struct list_head *work_q, > struct list_head *done_q); > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 3cd8c3b..afb0481 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -452,11 +452,6 @@ struct scsi_host_template { > unsigned no_write_same:1; > > /* > - * True if asynchronous aborts are not supported > - */ > - unsigned no_async_abort:1; > - > - /* > * Countdown for host blocking with no commands outstanding. > */ > unsigned int max_host_blocked; > -- > 1.8.5.6 > Hmm so, I guess we compromise in terms of how granular we want to recover? When say an abort for command A on LUN 1 behind Port α fails for some reason, then we also skip all possible aborts for command B on LUN 2 behind Port α and command C on LUN 1 behind Port β? (The host might already be in recovery by the time command B and C fail) I mean, in the end, all traffic holds on that host anyway, as long as we get into EH - which is true as soon as one abort fails. Might as well skip the wait time till we get there. I would like it, if we could document that fact a bit more direct. Otherwise I think that's good. The code also looks good. Reviewed-by: Benjamin Block <bblock@xxxxxxxxxxxxxxxxxx> Beste Grüße / Best regards, - Benjamin Block -- Linux on z Systems Development / IBM Systems & Technology Group IBM Deutschland Research & Development GmbH Vorsitz. AufsR.: Martina Koederitz / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294