Re: LSF: Multipathing and path checking question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hannes Reinecke wrote:

FC Transport already maintains an attribute for the path state, and even
sends netlink events if and when this attribute changes. For iSCSI I have

Are you referring to fc_host_post_event? Is the same thing we talked about last year, where you wanted events? Is this in multipath tools now or just in the SLES ones?

For something like FCH_EVT_LINKDOWN, are you going to fail the path at that time or when would the multipath path be marked failed?



to defer to your superior knowledge; of course it would be easiest if
iSCSI could send out the very same message FC does.

We can do something like fc_host_event_code for iscsi.

Question on what you are needing:

Do you mean you want to make fc_host_event_code more generic (there are some FC specific ones like lip_reset)? Put them in scsi-ml and send from a new netlink group that just sends these events?

Or do you just want something similar from iscsi? iscsi will hook into the iscsi netlink code using the scsi_netlink.c and then send a ISCSIH_EVT_LINKUP, ISCSIH_EVT, LINKDOWN, etc.

What do the FCH_EVT_PORT_* ones means?




Idea was to modify the state machine so that fast_fail_io_tmo is
being made mandatory, which transitions the sdev into an intermediate
state 'DISABLED' and sends out a netlink message.


Above when you said, "No, I already do this for FC (should be checking the replacement_timeout, too ...)", did you mean that you have mulitpath tools always setting fast io fail now?

For iscsi the replacement_timeout is always set already. If from multipath tools you are going to add some code so multipth sets this I can make iscsi allow the replacement_timeout to be set from sysfs like is done for FC's fast io fail.




sdev state:   RUNNING <-> BLOCKED <-> DISABLED -> CANCEL
mpath state:  path up <-> <stall> <-> path down -> remove from map

This will allow us to switch paths early, ie when it moves into
'DISABLED' state. But the path structure themselves are still alive,
so when a path comes back between 'DISABLED' and 'CANCEL' we won't
have an issue reconnecting it. And we could even allow to set a
dev_loss_tmo to infinity thereby simulating the 'old' behaviour.

However, this proposal didn't go through.

You got my hopes up for a solution in the the long explanation, then you destroyed them :)


Was the reason people did not like this because of the scsi device lifetime issue?


I think we still want someone to set the fast io fail tmo for users when multipath is being used, because we want IO out of the queues and drivers and sent to the multipath layer before dev_loss_tmo if dev_loss_tmo is still going to be a lot longer. fast io fail tmo is usually less than 10 or 5 and for dev_loss_tmo seems like we still have user setting that to minutes.


Can't the transport layers just send two events?
1. On the initial link down when the port/session is blocked.
2. When there fast io fail tmos fire.

Today, instead of #2, the Red Hat multipath tools guy and I were talking about doing a probe with SG_IO. For example we would send down a path tester IO and then wait for it to be failed with DID_TRANSPORT_FAILFAST.

Or for #2 if we cannot have a new event, can we send a transport level bsg request? For iscsi this would be a nop. For FC, I am not sure what it would be?

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux