On Wed, 2013-10-23 at 11:25 +0200, Hannes Reinecke wrote: > On 10/16/2013 09:22 PM, James Bottomley wrote: > > On Mon, 2013-07-01 at 08:50 +0200, Hannes Reinecke wrote: > >> This patchs adds an 'eh_deadline' sysfs attribute to the scsi > >> host which limits the overall runtime of the SCSI EH. > >> The 'eh_deadline' value is stored in the now obsolete field > >> 'resetting'. > >> When a command is failed the start time of the EH is stored > >> in 'last_reset'. If the overall runtime of the SCSI EH is longer > >> than last_reset + eh_deadline, the EH is short-circuited and > >> falls through to issue a host reset only. > > > > OK, so the specific problem with this one is that potentially it will > > spend all its time mucking about with aborts (which most often time out > > on non FC kit because of the issue problems) and then proceed to host > > reset, which mostly does nothing for failing devices. > > > > If you want to impose a deadline, then we need to spend only 50% of the > > time attempting aborts and the rest of the time escalating the resets. > > > > [...] > >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > >> index f43de1e..84369f2 100644 > >> --- a/drivers/scsi/scsi_error.c > >> +++ b/drivers/scsi/scsi_error.c > >> @@ -89,6 +89,18 @@ void scsi_schedule_eh(struct Scsi_Host *shost) > >> } > >> EXPORT_SYMBOL_GPL(scsi_schedule_eh); > >> > >> +static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) > >> +{ > >> + if (!shost->last_reset || !shost->eh_deadline) > >> + return 0; > >> + > >> + if (time_before(jiffies, > >> + shost->last_reset + shost->eh_deadline)) > >> + return 0; > >> + > >> + return 1; > >> +} > >> + > > > > What about instead: > > > > static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int percent) { > > if (!shost->last_reset || !shost->eh_deadline) > > return 0; > > > > if (time_before(jiffies, > > shost->last_reset + shost->eh_deadline * percent/100)) > > return 0; > > > > return 1; > > } > > > > which allows us to have > > > > if (scsi_host_eh_past_deadline(shost, 50)) { > > > > in scsi_eh_abort_cmds() > > > > if (scsi_host_eh_past_deadline(shost, 66) { > > > > in scsi_eh_bus_device_reset() > > > > say 83 in target reset, and 100 in bus reset. > > > > Thus ensuring we at least get a crack at the reset chain? > > > Alternatively we could just escalate directly to LUN reset once the > first abort fails. That looks like a cleaner solution here. I'm OK with that ... if you want this in the current merge window, you have about a week to code it up ... James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html