Re: [PATCH 13/14] ib_srp: Implement transport layer ping

David Dillow <dillowda@xxxxxxxx> · Mon, 19 Dec 2011 17:32:11 -0500

On Mon, 2011-12-19 at 05:16 -0500, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 12:50 AM, David Dillow <dillowda@xxxxxxxx> wrote:
> > On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote:
> >> Add a time-based transport layer test such that fail-over in a multipath
> >> setup can happen quickly.
> >
> > Why should this be done in the kernel? multipathd already verifies all
> > paths to a SCSI device are up and that the device is reachable.
> 
> I'm afraid it's impossible to make a transport layer check work
> reliably from user space. As an example, srp_reset_host() blocks the
> SCSI host before reconnecting. Before starting to attempt to
> reconnect, that action does block the SCSI host and hence also all
> transport layer checks issued from user space. I doubt it's possible
> to fix the resulting race between a transport layer reconnect issued
> from srp_reset_host() and a transport layer reconnect triggered from
> user space.

Part of the problem is introduced by allowing for permanent connections
rather than using the familiar dev_loss_tmo and fast_io_fail_tmo
parameters from other SCSI transports. For instance, in the FC
transport, rports are allowed to disappear for up to dev_loss_tmo
seconds before being removed from the SCSI device tree. Until they have
been gone for fast_fail_io_tmo seconds, they are blocked (as is error
handling to prevent offlining devices). Once they have been gone longer
that fast_fail_io_tmo, they become unblocked and new IO will be failed.

Now, the FC transport is probably a bit more complex than we want right
now, but following it (and the SAS transport's) lead should keep us in
sync with where the rest of the SCSI stack is headed.

As for reliability from user space, multipathd checks that the SCSI
initiator is not blocked before checking the liveness of the path;
blocked paths are assumed to be down. It still seems to default to a 30
second timeout for the test TUR/directIO/etc check, and that doesn't
currently look to be configurable (fixable). These timeouts are
independent of the SCSI layer, and will mark a path down for new traffic
without waiting for the lower level timeout.

The transport layer reconnect from user space (I'm assuming you were
thinking ioctl or using the sg device) would coordinated by the
SCSI-midlayer, using calls to srp_reset_host(), so I think we avoid any
race condition. Or did you mean a manual reconnect attempt from
manipulating srp_host/X/reconnect_tmo via sysfs -- in which case it is
in our code, so we can certainly avoid race conditions.

I still think this is already solved in user space, but the new
reconnect model you've implemented doesn't match up with the expected
semantics. It'd be better to match the rest of the SCSI stack for this.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html