Re: [PATCH] Make scsi error recovery play nice with devices blocked by transport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2006-01-09 at 10:01 -0500, James Smart wrote:
> > I think letting the harder resets happen is a good thing (or at least
> > not a bad thing) as long as recovery waits for the driver to report that
> > the drive is gone (offline).
> 
> Well, in thinking through this further after my initial reply...
> 
> I think we really do want to leave scsi_eh_ready_devs() logic with the bigger
> hammer steps alone. Ultimately, they are trying to regain the resources for an
> i/o that is trying to be killed but the LLDD (or device) isn't cooperating.
> I still believe in not resetting everyone just because a device is temporarily
> blocked. However, we need to intercept it at a earlier point... Ultimately,
> to reach this path, it starts with an i/o timing out, and the eh_abort handler
> failing. In Emulex's case, we are planning on never failing the eh_abort
> handler if we're in this temporarily blocked state, even at the expense of a
> long wait. This is actually too much to ask of an LLDD - and is hokey. The
> logic really should be to intercept the timeout handler, note that the device
> is blocked, and delay the abort request until the device has been given a
> chance to return (e.g. just restart the i/o abort timer for the amount of 
> devloss_tmo that remains). Otherwise, we're always guaranteeing a failure from
> the abort handlers (for i/o and device) as there's no device to talk to.
> 
> This should remove the need for your if-blocked test in scsi_error.c,
> replacing it with the logic in the i/o timeout handler.

Actually, there is another thing you can do even earlier:  implement
scsi_eh_timer_return() in the host template (probably with a generic
routine from the fc class).  This would allow you to hold off the
timeout at least for the length of the user specified timeout and all
the retries.  Probably the routine would simply check to see if the
device is in a devloss timeout and if it is return EH_RESET_TIMER;
otherwise return EH_NOT_HANDLED.

James


-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux