Multipath failover handling (Was: Re: 2.6.24-rc3-mm1)

Hannes Reinecke <hare@xxxxxxx> · Mon, 07 Jan 2008 15:05:43 +0100



James Bottomley wrote:
> On Fri, 2007-12-14 at 10:00 +0100, Hannes Reinecke wrote:
>> James Bottomley wrote:
>>> On Mon, 2007-11-26 at 22:15 -0800, Andrew Morton wrote:
>>>> OK, thanks.  I'll assume that James and Hannes have this in hand (or will
>>>> have, by mid-week) and I won't do anything here.
>>> Just to confirm what I think I'm going to be doing:  rebasing the
>>> scsi-misc tree to remove this commit:
>>>
>>> commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0
>>> Author: Hannes Reinecke <hare@xxxxxxx>
>>> Date:   Tue Nov 6 09:23:40 2007 +0100
>>>
>>>     [SCSI] Do not requeue requests if REQ_FAILFAST is set
>>>
>>> And its allied fix ups:
>>>
>>> commit 983289045faa96fba8841d3c51b98bb8623d9504
>>> Author: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
>>> Date:   Sat Nov 24 19:47:25 2007 +0200
>>>
>>>     [SCSI] fix up REQ_FASTFAIL not to fail when state is QUIESCE
>>>
>>> commit 9dd15a13b332e9f5c8ee752b1ccd9b84cb5bdf17
>>> Author: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
>>> Date:   Sat Nov 24 19:55:53 2007 +0200
>>>
>>>     [SCSI] fix domain validation to work again
>>>
>>> James
>>>
>>>
>> Or just apply my latest patch (cf Undo __scsi_kill_request).
>> The main point is that we shouldn't retry requests
>> with FAILFAST set when the queue is blocked. AFAICS
>> only FC and iSCSI transports set the queue to blocked,
>> and use this to indicate a loss of connection. So any
>> retry with queue blocked is futile.
> 
> I still don't think this is the right approach.
> 
> For link up/down events, those are direct pathing events and should be
> signalled along a kernel notifier, not by mucking with the SCSI state
> machine.
Of course they will be signalled. And eventually we should patch up
mutltipath-tools to read the exising events from the uevent socket.
But even with that patch there is a quite largish window during
which IOs will be sent to the blocked device, and hence will be
stuck in the request queue until the timer expires.

> However, there's still devloss_tmo to consider ... even in
> multipath, I don't think you want to signal path failure until
> devloss_tmo has fired otherwise you'll get too many transient up/down
> events which damage performance if the array has an expensive failover
> model.
> 
Yes. But currently we have a very high failover latency as we always have
to wait for the requeued commands to time-out.
Hence we're damaging performance on arrays with inexpensive failover.

> The other problem is what to do with in-flight commands at the time the
> link went down.  With your current patch, they're still stuck until they
> time out ... surely there needs to be some type of recovery mechanism
> for these?
> 
Well, the in-flight commands are owned by the HBA driver, which should
have the proper code to terminate / return those commands with the
appriopriate codes. They will then be rescheduled and will be caught
like 'normal' IO requests.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html