On 05/18/2013 01:54 PM, Viktor Larionov wrote: > Hi everybody! > > > > First of all, thanks for all the hard work you guys have been doing > developing dm. It’s an amazing piece of work you have done! > > While working with dm-multipath we have bumped into some limitations > which we felt bit uncomfortable with, and seems like managed to > change. I’d thought I share the experience on that with others, in > hope that this would help somebody. > > > > Long story short – our servers are connected to our SAN with both fc > and iscsi links. (same targets, same wwid’s are exported both > through fc and iscsi) > > Pretty much a standard installation – two independent controllers on > the storage side (fc and iscsi each), dual port fc controllers on > the server side + iscsi. > > All this leaves us with approximate of 6 paths per device. (2 fc, > and 4 iscsi – 1 fc, and 2 iscsi per storage controller) > > > > Now if we use ALUA, which is standard for our infra (IBM Storewize > V3700), the picture looks pretty much like this: > > > > alessandra viktor.larionov # multipath -ll www-2-mysql > > www-2-mysql (360050763008080581000000000000029) dm-37 IBM,2145 > > size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw > > |-+- policy='round-robin 0' prio=50 status=active > > | |- 2:0:0:9 sdak 66:64 active ready running > > | |- 3:0:0:9 sdcf 69:48 active ready running > > | `- 4:0:0:9 sdcy 70:96 active ready running > > `-+- policy='round-robin 0' prio=10 status=enabled > > |- 1:0:0:9 sdl 8:176 active ready running > > |- 5:0:0:9 sdcb 68:240 active ready running > > `- 6:0:0:9 sdct 70:16 active ready running > > > > Where sdak and sdl are fiber links and the rest of those are iscsi. > Priorities come from alua which correspond to san controller > preference at this particular moment. > > What we don’t like about this setup is that fc and iscsi links end > up with the same prioriy in the same group. The idea behind having > iscsi links on machines having fc at all, is redundancy to fc failures. > > But we surely don’t want to operate iscsi links the times when > either primary or backup fc are fully operational. > > > > So this led us to the idea, of somehow telling the prioritizer to be > more granular and separate fc and iscsi controller priorities. After > doing some several hour googling, I found out that we are not the > only ones with such a story, and there has been no solution to the > point. (take this one for example > http://www.redhat.com/archives/dm-devel/2008-August/msg00083.html) > In fact prio_callout which could possibly solve this kind of thing, > is deprecated. > > > > It’s true that there’s no easy or trivial way to determine if a path > behing an sg is fiber or iscsi (or something else). But thinking on > this issue, we thought that we actully can satisfy if we could just > assign a custom priority based on a scsi_id of the device. The idea > behind it is simple – say in our case we have an IBM ServeRAID > controller, which is SCSI host 0, Emulex Light Pulse which is SCSI > host 1 and 2 (for each port respectively and all of the rest is > iSCSI. So if we could give static priorities based on this > information this could do the trick. > > > > So, we poked up with code a bit, and wrote up a custom prioritizer, > called sg_id. (patch for the latest multipath-tools available here: > http://viktor.ee/multipath-tools-patches/sg_id_prio.patch) > > Usage is very simple: in /etc/multipath.conf: prio „sg_id“, and > priorities are passed through prio_args as regexes: e.g. a prio_args of > > prio_sg_id(default)=0 prio_sg_id(^[0-2]:0)=40 prio_sg_id(^5:[2-3]:)=30 > > will give prio 40 for everything on SCSI hosts 0, 1 and 2, channel > 0. 30 on scsi_host 5 channels 2 and 3, and everything else will get 0. > > > > Using sg_id in the upper example we will have sdl and sdak in the > first group, and all othe other stuff in the second. Which is ok, > but not quite. > > The problem with this approach for us is that ALUA gives us valuable > information on our storage priorities (which controller is primary > and which is secondary for that particular lun at this particular > moment), and we’re not quite ready to sacrifice this information > even for sg_id prios. If there only would be a way to use multiple > prioritizers. > > And so we’ve played another couple of our hours with multipath-tools > code allowing it to accept multiple prioritizers in prio > configuration. (patch here > http://viktor.ee/multipath-tools-patches/multiprio.patch) > > In this case, prioritizers should be separated by coma, semicolon or > space, and the end priority would be a sum of priorities given by > all of the specified prioritizers. (a single prioritizer value is > also accepted of course.) > > As an example: > > prio "sg_id, alua" > > prio_args "prio_sg_id(default)=0 > prio_sg_id(^[0-2]:0)=100" > > > > So combining the two of above with the same example we get: > > > > alessandra multipath-tools-0.4.9 # multipath -r www-2-mysql > > reload: www-2-mysql (360050763008080581000000000000029) undef IBM,2145 > > size=10G features='1 queue_if_no_path' hwhandler='0' wp=undef > > |-+- policy='round-robin 0' prio=150 status=undef > > | `- 2:0:0:9 sdak 66:64 active ready running > > |-+- policy='round-robin 0' prio=110 status=undef > > | `- 1:0:0:9 sdl 8:176 active ready running > > |-+- policy='round-robin 0' prio=50 status=undef > > | |- 3:0:0:9 sdcf 69:48 active ready running > > | `- 4:0:0:9 sdcy 70:96 active ready running > > `-+- policy='round-robin 0' prio=10 status=undef > > |- 5:0:0:9 sdcb 68:240 active ready running > > `- 6:0:0:9 sdct 70:16 active ready running > > > > Exactly what we needed: primary FC link with 150, secondary 110, and > then follow primary and secondary ISCSI links with 50 and 10 > respectively. > > All in all this one seems to have solved our problem, and well maybe > can help anybody elses too. > Actually, I like the idea with the stackable prioritizers. Not sure about the 'sg_id' thing; that's still too much to configure. We should be identifying the transport, and base some priorities based on the transport. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel