Closing Statements:
I've attached the original RFC below. See
http://marc.theaimsgroup.com/?l=linux-scsi&m=115082917628466&w=2
I've updated it with what I perceive to be the position and resolution
based on comments. Keep in mind that we're trying to lay the groundwork
for common behavior and tunables between the transports.
Please let me know if I've mis-represented anything, or if there is
a dissention in the resolution. I'd like to close on this.
James Smart wrote:
Folks,
The following addresses some long standing todo items I've had in the
FC transport. They primarily arise when considering multipathing, or
trying to marry driver internal state to transport state. It is intended
that this same type of functionality would be usable in other transports
as well.
Here's what is contained:
- dev_loss_tmo LLDD callback :
Currently, there is no notification to the LLDD of when the transport
gives up on the device returning and starts to return DID_NO_CONNECT
in the queuecommand helper function. This callback notifies the LLDD
that the transport has now given up on the rport, thereby acknowledging
the prior fc_remote_port_delete() call. The callback also expects the
LLDD to initiate the termination of any outstanding i/o on the rport.
I believe there is no dissention on this change.
Please note: this is essentially a confirmation from the transport to the
LLD that the rport is fully deleted. Thus, the LLD must expect to see
these callbacks as a normal part of the rport being terminated (even if
it is not blocked).
I'll move forward with this.
- fast_io_fail_tmo and LLD callback:
There are some cases where it may take a long while to truly determine
device loss, but the system is in a multipathing configuration that if
the i/o was failed quickly (faster than dev_loss_tmo), it could be
redirected to a different path and completed sooner (assuming the
multipath thing knew that the sdev was blocked).
iSCSI is one of the transports that may vary dev_loss_tmo values
per session, and you would like fast io failure.
The current transport implementation did not specify what happened to
active i/o (given to the driver, in the adapter, but not yet completed
back to the midlayer) when a device was blocked, nor during the
block-to->dev_loss transition period. It was up to the driver. Many
assumed active i/o was immediately terminated, which is semi-consistent
with the behavior of most drivers for most "connectivity loss" scenarios.
The conversations then started to jump around, considering what i/o's you
may want to have fail quickly, etc.
Here's my opinion:
We have the following points in time to look at:
(a) the device is blocked by the transport
(b) there is a time T, usually in a multipathing environment, where it
would be useful to error the i/o early rather than wait for dev_loss
It is assumed that any such i/o request would be marked REQ_FASTFAIL
(c) the dev_loss_tmo fires - we're to assume the device is gone
and at any time post (a), the device may return, unblock and never
encounter points (b) and (c).
As for what happens to active i/o :
always: the driver can fail an i/o at any point in time if it deems
it appropriate.
at (a): There are scenarios where a short link perturbation may occur,
which may not disrupt the i/o. Therefore, we should not force
io to be terminated.
at (b): Minimally, we should terminate all active i/o requests marked
as type REQ_FASTFAIL. From an api perspective, driver support
for this is optional. And we must also assume that there will
be implementations which have to abort all i/o in order to
terminate those marked REQ_FASTFAIL. Is this acceptable ?
(it meets the "always" condition above)
Q: so far we've limited the io to those w/ REQ_FASTFAIL.
Would we ever want to allow a user to fast fail all i/o
regardless of the request flags ? (in case they flags
weren't getting set on all the i/o the user wanted to
see fail ?)
There's a desire to address pending i/o (those on the block
request queue or new requests going there) so that if we've
crossed point (b) that we also fail them. The proposal is
to add a new state (device ? or queue ?), which would occur
as of point (b). All REQ_FASTFAIL io on the queue, as well
as on a new io, will be failed with a new i/o status if in
this state. Non-REQ_FASTFAIL i/o would continue to enter/sit
on the request queue until dev_loss_tmo fires.
at (c): per the dev_loss_tmo callback, all i/o should be terminated.
Their completions do not have to be synchronous to the return
from the callback - they can occur afterward.
Comments ?
Assuming that folks agree, I'd like to do this in 2 patches:
- one that puts in the transport fast_io_fail_tmo and LLD callback
- another that adds the new state, io completion status, and does the
handling of the request queue REQ_FASTFAIL i/o.
- fast_loss_time recommendation:
In discussing how a admin should set dev_loss_tmo in a multipathing
environment, it became apparent that we expected the admin to know
a lot. They had to know the transport type, what the minimum setting
can be that still survives normal link bouncing, and they may even
have to know about device specifics. For iSCSI, the proper loss time
may vary widely from session to session.
This attribute is an exported "recommendation" by the LLDD and transport
on what the lowest setting for dev_loss_tmo should be for a multipathing
environment. Thus, the admin only needs to cat this attribute to obtain
the value to echo into dev_loss_tmo.
The only objection was from Christoph - wanting a utility to get/set this
stuff. However, the counter was this attribute was still meaningful, as it
was the conduit to obtain a recommendation from the transport/LLD.
So - I assume this proceeds as is - with a change in it's description.
I have one criticism of these changes. The callbacks are calling into
the LLDD with an rport post the driver's rport_delete call. What it means
is that we are essentially extending the lifetime of an rport until the
dev_loss_tmo call occurs.
It's ok - and adding the appropriate comments are fine.
Thanks.
-- james s
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html