Re: Designing a new prio_callout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Definitely, that would be ideal -- having code on our end that tracked who was writing to which path. It's a matter of development effort at this point, and so I'm exploring other options.

You can imagine our system as a number of different machines -- each with separate network addresses -- that all provide access to the same LUs. Let's say you have a single target on our system called "mytarget." Users could log into that target via any one of a number of network addresses (even via DNS name, I suppose).

So the response from SendTargets is along the lines of:
10.53.152.22:3260,1 iqn.2001-07.com.company:qaiscsi2:mytarget
10.53.152.23:3260 ,2 iqn.2001-07.com.company:qaiscsi2:mytarget

So they might initiate two logins to two separate IPs:
iscsiadm -m node --portal 10.53.152.22 --target iqn.2001-07.com.company:qaiscsi2:mytarget --login;
iscsiadm -m node --portal 10.53.152.23 --target iqn.2001-07.com.company:qaiscsi2:mytarget --login;

Now what happens? If mytarget has multiple LUs associated with it, the multipath output will look like it did below if failover is being used -- two paths for each of two devices. The problem for us is that by default, multipath just uses the first path that it sees. Which means that for every device in mytarget, all data will be read and written across just the first path -- 10.53.151.22, in this case.

We need a way to load balance connections across all available connections.

There are several ways that I can see to do this. Ideally, we would implement ALUA on our end and advise people to use mpath_prio_alua as their callout. But this has a development cost. We could also implement a custom system as your suggest, but this also has a development cost.

If we could advise users to manually set priorities on the client side, that would be acceptable, but this is impossible with the current version of multipath.

As such, the best we can do is to set path priorities randomly, using mpath_prio_random. This is fine, but there is a significant cost in terms of resource usage on our system when the active path changes frequently, especially in cases where users have thousands of clients connected to our system, and paths are switching constantly. Thus we need to limit the number of times the active path switches, which the rr_min_io settings seems to do quite nicely.

Not sure if that makes any more sense? I'm trying to be thorough for the sake of the next guy. The information on the web is pretty minimal about all this, and it's been a painful experience getting up to speed.


On a related note, I've read the reports of people experiencing higher levels of performance with lower settings of rr_min_io, but it seems to me that as rr_min_io gets smaller, the system becomes less like active/passive MPIO and more like active/active MPIO, so users experiencing this performance improvement would be better off using group_by_serial, so that all paths are excitable simultaneously.

On 8/15/07, Stefan Bader <Stefan.Bader@xxxxxxxxxx> wrote:
Hi Ethan,

I might not understand the problem completely but I do not understand
the benefit of changing rr_min_io. As far as I can see from your
multipath output, both of the devices consist of two path groups with
one path. This means, as long as there is no path failure I/O will
never be sent to the inactive group.
I guess the only thing you need is a script that might find out from a
given scsi device (like sdc) whether this would be the preferred path
and then print a number that represents the priority (the lower, the
higher). Then use this as priority callout and group by priority with
failback set to immediate.

Regards,
Stefan

2007/8/14, Ethan John < ethan.john@xxxxxxxxx>:
> For the record, setting rr_min_io to something extremely large (we're using
> 2 billion now, since I'm assuming it's a C integer) solves the immediate
> problem that we're having (overhead in path switching causing poor

> > mpath45 (20002c9020020001a00151b6b46bb57b0) dm-1
> company,iSCSI target
> > [size=15G][features=0][hwhandler=0]
> > \_ round-robin 0 [prio=1][active]
> >  \_ 22:0:0:1 sdc 8:32  [active][ready]
> > \_ round-robin 0 [prio=1][enabled]
> >  \_ 23:0:0:1 sde 8:64  [active][ready]
> > mpath44 (20002c9020020001200151b6b46bb57ae) dm-0
> company,iSCSI target
> > [size=15G][features=0][hwhandler=0]
> > \_ round-robin 0 [prio=1][enabled]
> >  \_ 22:0:0:0 sdb 8:16  [active][ready]
> > \_ round-robin 0 [prio=1][enabled]
> >  \_ 23:0:0:0 sdd 8:48  [active][ready]
> >

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel



--
Ethan John
http://www.flickr.com/photos/thaen/
(206) 841.4157
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux