Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2013-10-23 at 11:25 +0200, Hannes Reinecke wrote:
> On 10/16/2013 09:22 PM, James Bottomley wrote:
> > On Mon, 2013-07-01 at 08:50 +0200, Hannes Reinecke wrote:
> >> This patchs adds an 'eh_deadline' sysfs attribute to the scsi
> >> host which limits the overall runtime of the SCSI EH.
> >> The 'eh_deadline' value is stored in the now obsolete field
> >> 'resetting'.
> >> When a command is failed the start time of the EH is stored
> >> in 'last_reset'. If the overall runtime of the SCSI EH is longer
> >> than last_reset + eh_deadline, the EH is short-circuited and
> >> falls through to issue a host reset only.
> >
> > OK, so the specific problem with this one is that potentially it will
> > spend all its time mucking about with aborts (which most often time out
> > on non FC kit because of the issue problems) and then proceed to host
> > reset, which mostly does nothing for failing devices.
> >
> > If you want to impose a deadline, then we need to spend only 50% of the
> > time attempting aborts and the rest of the time escalating the resets.
> >
> > [...]
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index f43de1e..84369f2 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -89,6 +89,18 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
> >>   }
> >>   EXPORT_SYMBOL_GPL(scsi_schedule_eh);
> >>
> >> +static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
> >> +{
> >> +	if (!shost->last_reset || !shost->eh_deadline)
> >> +		return 0;
> >> +
> >> +	if (time_before(jiffies,
> >> +			shost->last_reset + shost->eh_deadline))
> >> +		return 0;
> >> +
> >> +	return 1;
> >> +}
> >> +
> >
> > What about instead:
> >
> > static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int percent) {
> > 	if (!shost->last_reset || !shost->eh_deadline)
> > 		return 0;
> >
> > 	if (time_before(jiffies,
> > 			shost->last_reset + shost->eh_deadline * percent/100))
> > 		return 0;
> >
> > 	return 1;
> > }
> >
> > which allows us to have
> >
> > if (scsi_host_eh_past_deadline(shost, 50)) {
> >
> > in scsi_eh_abort_cmds()
> >
> > if (scsi_host_eh_past_deadline(shost, 66) {
> >
> > in scsi_eh_bus_device_reset()
> >
> > say 83 in target reset, and 100 in bus reset.
> >
> > Thus ensuring we at least get a crack at the reset chain?
> >
> Alternatively we could just escalate directly to LUN reset once the 
> first abort fails. That looks like a cleaner solution here.

I'm OK with that ... if you want this in the current merge window, you
have about a week to code it up ...

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux