On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden <daniel.stodden@xxxxxxxxxx> wrote: > > Hi. > > I've recently been spending some time tracing path checks on iSCSI > targets. > > Samples described here were taken with the directio checker on a netapp > lun, but I believe the target kind doesn't matter here, since most of > what I find is rather driven by the initiator side. > > So what I see is: > > 1. The directio checker issues its aio read on sector0. > > 2. The request obviously will block until iscsi is giving up on it. > This typically happens not before target pings (noop-out ops) > issued internally by the initiator time out. Look like: > > iscsid: Nop-out timedout after 15 seconds on connection 1:0 > state (3). Dropping session. > > (period and timeouts depend on the configuration at hand). > > 3. Session failure still won't unblock the read. This is because the > iscsi session will enter recovery mode, to avoid failing the > data path right away. The device will enter blocked state during > that period. > > Since I'm provoking a complete failure, this will time out as well, > but only later: > > iscsi: session recovery timed out after 15 secs > > (again, timeouts are iscsid.conf-dependent) > > 4. This will finally unblock the directio check with EIO, > triggering the path failure. > > > My main issue is that a device sitting on a software iscsi initiator > > a) performs its own path failure detection and > b) defers data path operations to mask failures, > which obviously counteracts a checker based on > data path operations. > > Kernels somewhere during the 2.6.2x series apparently started to move > part of the session checks into the kernel (apparently including the > noop-out itself, but I don't). One side effect of that is that session > state can be queried via sysfs. > > So right now I'm mainly wondering if a multipath failure driven rather > by polling session state that a data read wouldn't be more effective? > > I've only been browsing part of the iscsi code by now, but I don't see > how data path failures wouldn't relate to session state. > > There's some code attached below to demonstrate that. It presently jumps > through some extra loops to reverse-map fd back to the block device > node, but the basic thing was relatively straightforward to implement. > > Thanks in advance for about any input on that matter. > > Cheers, > Daniel > You might look at the multipath-tools patch included in a fairly recent dm-devel mail titled "[PATCH] Update path_offline() to return device status" The committed patch is available here: http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel