Hi Mike, Mike Christie wrote: > Hey, > > For this topic: > > ----------------------- > Next-Gen Multipathing > --------------------- > Dr. Hannes Reinecke > > ...... > > Should path checkers use sd->state to check for errors or availability? > ---------------------- > > What was decided? > > Could this problem be fixed or helped if multipath tools always sets the > fast io fail tmo for FC or the replacement_timeout for iscsi? > No, I already do this for FC (should be checking the replacement_timeout, too ...) > If those are set then IO in the blocked queue and in the driver will get > failed after fast io fail tmo/replacement_timeout seconds (driver has to > implement a terminate rport IO callback and only mptfc does not now). So > at this time, do we want to fail the path? > > Or are people thinking that we want to fail the path when the problem is > initially detected like when the LLD deletes the rport for fc for example? > Well, the idea is the following: The primary purpose of the path checkers is to check the availability of the paths (my, that was easy :-). And the main problem we have with the path checkers is that they are using actual SCSI commands to determine this, thereby incurring unrelated errors (Disk errors, delaying response due to blocked path behaviour or error handling etc). So we have to invest quite a bit of logic to separate the 'true' path condition from unrelated errors, simply because we're checking at the wrong level; the path state is maintained by the transport layer, not by the SCSI layer. So the suggestion here is to check the transport layer for the path states and do away with the existing path_checker SG_IO mechanism. The secondary use of the path checkers (determine inactive paths) will have to be delegated to the priority callouts, which then have to arrange the paths correctly. FC Transport already maintains an attribute for the path state, and even sends netlink events if and when this attribute changes. For iSCSI I have to defer to your superior knowledge; of course it would be easiest if iSCSI could send out the very same message FC does. > > > Also for this one: > ----------------------- > How to communication device went away: > 1) send event to udev (uses netlink) > ----------------------- > > Is this an event when dev_loss_tmo fires or when the LLD first detects > something like a link down (or any event it might block the rport for), > or would it be for when the fast fail io tmo fires (when the fc class is > going to fail running IO and incoming IO), or would we have events for > all of them? > Currently the event is sent when the device itself is removed from sysfs. And only then can we actually update the path maps and (possibly) change to another part. We cannot do anything when the path is blocked (ie when dev_loss_tmo is active) as we require this interval to capture jitter on the line. So we have this state diagram: sdev state: RUNNING <-> BLOCKED -> CANCEL mpath state: path up <-> <stall> -> path down / remove from map Notice the '<stall>' here; we cannot check the path state when the sdev is blocked as all I/O will be queued. And also note that we now lump two different multipath path states together; a path down is basically always followed immediately by a path remove event. However, when all paths are down (and queue_if_no_path is active) we might run into a deadlock when a path comes back, as we might not have enough memory to actually create the required structures. Idea was to modify the state machine so that fast_fail_io_tmo is being made mandatory, which transitions the sdev into an intermediate state 'DISABLED' and sends out a netlink message. sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL mpath state: path up <-> <stall> <-> path down -> remove from map This will allow us to switch paths early, ie when it moves into 'DISABLED' state. But the path structure themselves are still alive, so when a path comes back between 'DISABLED' and 'CANCEL' we won't have an issue reconnecting it. And we could even allow to set a dev_loss_tmo to infinity thereby simulating the 'old' behaviour. However, this proposal didn't go through. Instead it was proposed to do away with the unlimited queue_if_no_path setting and _always_ have a timeout there, so that the machine is able to recover after a certain period of time. I still like my original proposal, though. Maybe we can do the EU referendum thing and just ask again and again until everyone becomes tired of it and just says 'yes' to get rid of this issue ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html