Re: [RFC] [Last Rites] fc transport: extensions for fast fail and dev loss

Michael Reed <mdr@xxxxxxx> · Tue, 08 Aug 2006 17:15:04 -0500

Michael Reed wrote:
> 
> James Smart wrote:
>> Closing Statements:
>>
>> I've attached the original RFC below. See
>>   http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
>>
>> I've updated it with what I perceive to be the position and resolution
>> based on comments.  Keep in mind that we're trying to lay the groundwork
>> for common behavior and tunables between the transports.
>>
>> Please let me know if I've mis-represented anything, or if there is
>> a dissention in the resolution. I'd like to close on this.
>>
>> James Smart wrote:
>>> Folks,
>>>
>>> The following addresses some long standing todo items I've had in the
>>> FC transport. They primarily arise when considering multipathing, or
>>> trying to marry driver internal state to transport state. It is intended
>>> that this same type of functionality would be usable in other transports
>>> as well.
>>>
>>> Here's what is contained:
>>>
>>> - dev_loss_tmo LLDD callback :
>>>   Currently, there is no notification to the LLDD of when the transport
>>>   gives up on the device returning and starts to return DID_NO_CONNECT
>>>   in the queuecommand helper function. This callback notifies the LLDD
>>>   that the transport has now given up on the rport, thereby acknowledging
>>>   the prior fc_remote_port_delete() call. The callback also expects the
>>>   LLDD to initiate the termination of any outstanding i/o on the rport.
>> I believe there is no dissention on this change.
>> Please note: this is essentially a confirmation from the transport to the
>>   LLD that the rport is fully deleted. Thus, the LLD must expect to see
>>   these callbacks as a normal part of the rport being terminated (even if
>>   it is not blocked).
>>
>> I'll move forward with this.
> 
> Concur.
> 
>>> - fast_io_fail_tmo and LLD callback:
>>>   There are some cases where it may take a long while to truly determine
>>>   device loss, but the system is in a multipathing configuration that if
>>>   the i/o was failed quickly (faster than dev_loss_tmo), it could be
>>>   redirected to a different path and completed sooner (assuming the  
>>> multipath thing knew that the sdev was blocked).
>>>     iSCSI is one of the transports that may vary dev_loss_tmo values
>>>   per session, and you would like fast io failure.
>>
>> The current transport implementation did not specify what happened to
>>   active i/o (given to the driver, in the adapter, but not yet completed
>>   back to the midlayer) when a device was blocked, nor during the
>>   block-to->dev_loss transition period. It was up to the driver.  Many
>>   assumed active i/o was immediately terminated, which is semi-consistent
>>   with the behavior of most drivers for most "connectivity loss" scenarios.
>>
>> The conversations then started to jump around, considering what i/o's you
>>   may want to have fail quickly, etc.
>>
>> Here's my opinion:
>>   We have the following points in time to look at:
>>    (a) the device is blocked by the transport
>>    (b) there is a time T, usually in a multipathing environment, where it
>>        would be useful to error the i/o early rather than wait for dev_loss
>>        It is assumed that any such i/o request would be marked REQ_FASTFAIL
>>    (c) the dev_loss_tmo fires - we're to assume the device is gone
>>   and at any time post (a), the device may return, unblock and never
>>   encounter points (b) and (c).
> 
> REQ_FAILFAST is stored in the request structure.  Are there "issues"
> with using scsi_cmnd.request in the lldd?
> 
>>   As for what happens to active i/o :
>>
>>   always: the driver can fail an i/o at any point in time if it deems
>>           it appropriate.
>>
>>   at (a): There are scenarios where a short link perturbation may occur,
>>           which may not disrupt the i/o. Therefore, we should not force
>>           io to be terminated.
>>
>>   at (b): Minimally, we should terminate all active i/o requests marked
>>           as type REQ_FASTFAIL. From an api perspective, driver support
>>           for this is optional. And we must also assume that there will
>>           be implementations which have to abort all i/o in order to
>>           terminate those marked REQ_FASTFAIL. Is this acceptable ?
>>           (it meets the "always" condition above)
>>
>>           Q: so far we've limited the io to those w/ REQ_FASTFAIL.
>>             Would we ever want to allow a user to fast fail all i/o
>>             regardless of the request flags ? (in case they flags
>>             weren't getting set on all the i/o the user wanted to
>>             see fail ?)
> 
> REQ_FAILFAST appears to influence retries during error recovery so
> there may be unexpected side effects of doing this.  But, that said,
> I'd say yes.  From my perspective, I'd make this the default behavior.
> 
> In talking with our volume manager people, the question raised was
> "Why would you want some i/o to fail quickly and some not?"
> They even considered non-i/o scsi commands.
> 
> I think that if the default behavior is to have fast_io_fail_tmo
> enabled, then it should be controlled by REQ_FAILFAST in the
> request.  If the default is to have the timer disabled, i.e.,
> an admin has to enable it (or define when it's enabled) then
> when enabled it should apply to every i/o.  In examining the
> patch, it appears to be disabled by default, so our conclusion
> is that all i/o should fast fail when enabled.  We also concur
> with having fast_io_fail_tmo disabled by default.

I think this also implies that our volume manager guys would be
just as happy setting dev_loss_tmo to a small value and not use
fast_io_fail_tmo.

Mike

> 
> I guess this implies leave REQ_FAILFAST to error recovery.  :)
> 
> Mike
> 
> 
> 
>>           There's a desire to address pending i/o (those on the block
>>           request queue or new requests going there) so that if we've
>>           crossed point (b) that we also fail them.  The proposal is
>>           to add a new state (device ? or queue ?), which would occur
>>           as of point (b). All REQ_FASTFAIL io on the queue, as well
>>           as on a new io, will be failed with a new i/o status if in
>>           this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
>>           on the request queue until dev_loss_tmo fires.
>>
>>   at (c): per the dev_loss_tmo callback, all i/o should be terminated.
>>           Their completions do not have to be synchronous to the return
>>           from the callback - they can occur afterward.
>>
>>
>> Comments ?
>>
>> Assuming that folks agree, I'd like to do this in 2 patches:
>>  - one that puts in the transport fast_io_fail_tmo and LLD callback
>>  - another that adds the new state, io completion status, and does the
>>    handling of the request queue REQ_FASTFAIL i/o.
>>
>>> - fast_loss_time recommendation:
>>>   In discussing how a admin should set dev_loss_tmo in a multipathing
>>>   environment, it became apparent that we expected the admin to know
>>>   a lot. They had to know the transport type, what the minimum setting
>>>   can be that still survives normal link bouncing, and they may even
>>>   have to know about device specifics.  For iSCSI, the proper loss time
>>>   may vary widely from session to session.
>>>
>>>   This attribute is an exported "recommendation" by the LLDD and
>>> transport
>>>   on what the lowest setting for dev_loss_tmo should be for a
>>> multipathing
>>>   environment. Thus, the admin only needs to cat this attribute to obtain
>>>   the value to echo into dev_loss_tmo.
>> The only objection was from Christoph - wanting a utility to get/set this
>> stuff. However, the counter was this attribute was still meaningful, as it
>> was the conduit to obtain a recommendation from the transport/LLD.
>>
>> So - I assume this proceeds as is - with a change in it's description.
>>
>>>  
>>> I have one criticism of these changes. The callbacks are calling into
>>> the LLDD with an rport post the driver's rport_delete call. What it means
>>> is that we are essentially extending the lifetime of an rport until the
>>> dev_loss_tmo call occurs.
>> It's ok - and adding the appropriate comments are fine.
>>
>>
>> Thanks.
>>
>> -- james s
>>
>> -
>> : send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html