multipath-tools: scsi_id based path priorities and multiple prioritizers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everybody!

 

First of all, thanks for all the hard work you guys have been doing developing dm. It’s an amazing piece of work you have done!

While working with dm-multipath we have bumped into some limitations which we felt bit uncomfortable with, and seems like managed to change. I’d thought I share the experience on that with others, in hope that this would help somebody.

 

Long story short – our servers are connected to our SAN with both fc and iscsi links. (same targets, same wwid’s are exported both through fc and iscsi)

Pretty much a standard installation – two independent controllers on the storage side (fc and iscsi each), dual port fc controllers on the server side + iscsi.

All this leaves us with approximate of 6 paths per device. (2 fc, and 4 iscsi – 1 fc, and 2 iscsi per storage controller)

 

Now if we use ALUA, which is standard for our infra (IBM Storewize V3700), the picture looks pretty much like this:

 

alessandra viktor.larionov # multipath -ll www-2-mysql

www-2-mysql (360050763008080581000000000000029) dm-37 IBM,2145

size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=50 status=active

| |- 2:0:0:9  sdak 66:64   active ready running

| |- 3:0:0:9  sdcf 69:48   active ready running

| `- 4:0:0:9  sdcy 70:96   active ready running

`-+- policy='round-robin 0' prio=10 status=enabled

  |- 1:0:0:9  sdl  8:176   active ready running

  |- 5:0:0:9  sdcb 68:240  active ready running

  `- 6:0:0:9  sdct 70:16   active ready running

 

Where sdak and sdl are fiber links and the rest of those are iscsi. Priorities come from alua which correspond to san controller preference at this particular moment.

What we don’t like about this setup is that fc and iscsi links end up with the same prioriy in the same group. The idea behind having iscsi links on machines having fc at all, is redundancy to fc failures.

But we surely don’t want to operate iscsi links the times when either primary or backup fc are fully operational.

 

So this led us to the idea, of somehow telling the prioritizer to be more granular and separate fc and iscsi controller priorities. After doing some several hour googling, I found out that we are not the only ones with such a story, and there has been no solution to the point. (take this one for example http://www.redhat.com/archives/dm-devel/2008-August/msg00083.html) In fact prio_callout which could possibly solve this kind of thing, is deprecated.

 

It’s true that there’s no easy or trivial way to determine if a path behing an sg is fiber or iscsi (or something else). But thinking on this issue, we thought that we actully can satisfy if we could just assign a custom priority based on a scsi_id of the device. The idea behind it is simple – say in our case we have an IBM ServeRAID controller, which is SCSI host 0, Emulex Light Pulse which is SCSI host 1 and 2 (for each port respectively and all of the rest is iSCSI. So if we could give static priorities based on this information this could do the trick.

 

So, we poked up with code a bit, and wrote up a custom prioritizer, called sg_id. (patch for the latest multipath-tools available here: http://viktor.ee/multipath-tools-patches/sg_id_prio.patch)

Usage is very simple: in /etc/multipath.conf: prio „sg_id“, and priorities are passed through prio_args as regexes: e.g. a prio_args of

prio_sg_id(default)=0 prio_sg_id(^[0-2]:0)=40 prio_sg_id(^5:[2-3]:)=30

will give prio 40 for everything on SCSI hosts 0, 1 and 2, channel 0. 30 on scsi_host 5 channels 2 and 3, and everything else will get 0.

 

Using sg_id in the upper example we will have sdl and sdak in the first group, and all othe other stuff in the second. Which is ok, but not quite.

The problem with this approach for us is that ALUA gives us valuable information on our storage priorities (which controller is primary and which is secondary for that particular lun at this particular moment), and we’re not quite ready to sacrifice this information even for sg_id prios. If there only would be a way to use multiple prioritizers.

And so we’ve played another couple of our hours with multipath-tools code allowing it to accept multiple prioritizers in prio configuration. (patch here http://viktor.ee/multipath-tools-patches/multiprio.patch)

In this case, prioritizers should be separated by coma, semicolon or space, and the end priority would be a sum of priorities given by all of the specified prioritizers. (a single prioritizer value is also accepted of course.)

As an example:

        prio                  "sg_id, alua"

        prio_args             "prio_sg_id(default)=0 prio_sg_id(^[0-2]:0)=100"

 

So combining the two of above with the same example we get:

 

alessandra multipath-tools-0.4.9 # multipath -r www-2-mysql

reload: www-2-mysql (360050763008080581000000000000029) undef IBM,2145

size=10G features='1 queue_if_no_path' hwhandler='0' wp=undef

|-+- policy='round-robin 0' prio=150 status=undef

| `- 2:0:0:9  sdak 66:64   active ready running

|-+- policy='round-robin 0' prio=110 status=undef

| `- 1:0:0:9  sdl  8:176   active ready running

|-+- policy='round-robin 0' prio=50 status=undef

| |- 3:0:0:9  sdcf 69:48   active ready running

| `- 4:0:0:9  sdcy 70:96   active ready running

`-+- policy='round-robin 0' prio=10 status=undef

  |- 5:0:0:9  sdcb 68:240  active ready running

  `- 6:0:0:9  sdct 70:16   active ready running

 

Exactly what we needed: primary FC link with 150, secondary 110, and then follow primary and secondary ISCSI links with 50 and 10 respectively.

All in all this one seems to have solved our problem, and well maybe can help anybody elses too.

 

All comments are kindly welcome!

 

Cheers,

Viktor


Salva Kindlustuse ASViktor Larionov
IT osakonna juhataja
IT-osakond
Salva Kindlustuse AS
Tel: (+372) 683 0630 | GSM: (+372) 566 86811 | Viktor.Larionov@xxxxxxxx | www.salva.ee

 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux