Hi Mike, Mike Christie wrote: > Hannes Reinecke wrote: >> >> FC Transport already maintains an attribute for the path state, and even >> sends netlink events if and when this attribute changes. For iSCSI I have > > Are you referring to fc_host_post_event? Is the same thing we talked > about last year, where you wanted events? Is this in multipath tools now > or just in the SLES ones? > Yep, that's the thing. > For something like FCH_EVT_LINKDOWN, are you going to fail the path at > that time or when would the multipath path be marked failed? > This is just a notification that the path has gone down. Fast fail / dev_loss_tmo still applies, ie that path won't get switched then. > > >> to defer to your superior knowledge; of course it would be easiest if >> iSCSI could send out the very same message FC does. > > We can do something like fc_host_event_code for iscsi. > Oh, that'll be grand. > Question on what you are needing: > > Do you mean you want to make fc_host_event_code more generic (there are > some FC specific ones like lip_reset)? Put them in scsi-ml and send from > a new netlink group that just sends these events? > > Or do you just want something similar from iscsi? iscsi will hook into > the iscsi netlink code using the scsi_netlink.c and then send a > ISCSIH_EVT_LINKUP, ISCSIH_EVT, LINKDOWN, etc. > Well, actually, I don't care. It's just if we were to go with the proposal we'll have to fix up all transports to present the path state to userspace; preferably with both, netlink events and sysfs attributes. The actual implementation might well be transport-specific. > What do the FCH_EVT_PORT_* ones means? > FC stuff methinks. James S. should know better. > > >> >> Idea was to modify the state machine so that fast_fail_io_tmo is >> being made mandatory, which transitions the sdev into an intermediate >> state 'DISABLED' and sends out a netlink message. > > > Above when you said, "No, I already do this for FC (should be checking > the replacement_timeout, too ...)", did you mean that you have mulitpath > tools always setting fast io fail now? > Yes, quite so. Look at git://git.kernel.org/pub/scm/linux/kernel/git/hare/multipath-tools branch sles11 for details. > For iscsi the replacement_timeout is always set already. If from > multipath tools you are going to add some code so multipth sets this I > can make iscsi allow the replacement_timeout to be set from sysfs like > is done for FC's fast io fail. > Oh, that would be awesome. Currently I think we have a mismatch / race condition between iSCSI and multipathing, where ERL in iSCSI actually counteracts multipathing. But I'll be investigating that one shortly. > > >> >> sdev state: RUNNING <-> BLOCKED <-> DISABLED -> CANCEL >> mpath state: path up <-> <stall> <-> path down -> remove from map >> >> This will allow us to switch paths early, ie when it moves into >> 'DISABLED' state. But the path structure themselves are still alive, >> so when a path comes back between 'DISABLED' and 'CANCEL' we won't >> have an issue reconnecting it. And we could even allow to set a >> dev_loss_tmo to infinity thereby simulating the 'old' behaviour. >> >> However, this proposal didn't go through. > > You got my hopes up for a solution in the the long explanation, then you > destroyed them :) > Yes, same here. I really thought this to be a sensible proposal, but then the discussion veered off into queue_if_no_path handling. > > Was the reason people did not like this because of the scsi device > lifetime issue? > > > I think we still want someone to set the fast io fail tmo for users when > multipath is being used, because we want IO out of the queues and > drivers and sent to the multipath layer before dev_loss_tmo if > dev_loss_tmo is still going to be a lot longer. fast io fail tmo is > usually less than 10 or 5 and for dev_loss_tmo seems like we still have > user setting that to minutes. > Exactly. Point here is that with the current implementation we basically _cannot_ return 'path down' anymore, as the path is either blocked (during which time all I/O got stalled) or failed completely (ie in state 'CANCEL'). Which is a bit of a detriment and we actually run into quite some contention when the path is removed, as we have to kill all I/O, fail over paths, remove stale paths, update device-mapper tables etc. When decoupling this by having the midlayer always return 'DID_TRANSPORT_DISRUPTED' after fast_fail_io we would be able to kill all I/O and switch paths gracefully. Path removal and device-mapper table update would then be done later one when dev_loss_tmo triggers. > > Can't the transport layers just send two events? > 1. On the initial link down when the port/session is blocked. > 2. When there fast io fail tmos fire. > Yes, that would be a good start. > Today, instead of #2, the Red Hat multipath tools guy and I were talking > about doing a probe with SG_IO. For example we would send down a path > tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST. > No. this is exactly what you cannot do. SG_IO will be stalled when the sdev is BLOCKED and will only return a result _after_ the sdev transitions _out_ of the BLOCKED state. Translated to FC this means that whenever dev_loss_tmo is _active_ (!) no I/O will be send out neither any I/O result will be returned to userland. Hence using SG_IO for path checker is a bad idea here. Hence my proposal. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html