On Wed, Apr 27, 2016 at 04:39:49PM -0700, James Bottomley wrote: > Multipath - Mike Snitzer > ------------------------ > > Mike began with a request for feedback, which quickly lead to the > complaint that recovery time (and how you recover) was one of the > biggest issues in device mapper multipath (dmmp) for those in the room. > This is primarily caused by having to wait for the pending I/O to be > released by the failing path. Christoph Hellwig said that NVMe would > soon do path failover internally (without any need for dmmp) and asked > if people would be interested in a more general implementation of this. > Martin Petersen said he would look at implementing this in SCSI as > well. The discussion noted that internal path failover only works in > the case where the transport is the same across all the paths and > supports some type of path down notification. In any cases where this > isn't true (such as failover from fibre channel to iSCSI) you still > have to use dmmp. Other benefits of internal path failover are that > the transport level code is much better qualified to recognise when the > same device appears over multiple paths, so it should make a lot of the > configuration seamless. Given the variety of sensible configurations that I've seen for people's multipath setups, there will definitely be a chunk of configuration that will never be seemless. Just in the past few weeks, we've added code to make it easier to allow people to manually configure devices for situations where none of our automated heuristics do what the user needs. Even for the easy cases, like ALUA, we've been adding options to allow users to do things like specify what they want to happen when they set the TPGS Pref bit. Recognizing which paths go together is simple. That part has always been seemless from the users point of view. Configuring how IO is blanced and failed over between the paths is where the complexity is. > The consequence for end users would be that > now SCSI devices would become handles for end devices rather than > handles for paths to end devices. This will have a lot of repercussions with applications that uses scsi devices. A significant number of tools expect that a scsi device maps to a connection between an initiator port and a target port. Listing the topology of these new scsi devices, and getting the IO stats down the various paths to them will involve writing new tools, or rewriting existing one. Things like persistent reservations will work differently (albeit, probably more intuitively). I'm not saying that this can't be made to work nicely for a significant subset of cases (like has been pointed out with the muliple transport case, this won't work for all cases). I just think that it's not a small amount of work, and not necessarily the only way to speed up failover. -Ben > James > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html