Re: RAID performance

Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> · Thu, 14 Feb 2013 06:22:33 -0600

On 2/13/2013 2:20 PM, Adam Goryachev wrote:
> Well, it's 7am, and I'm still here.... It all didn't go as well as I had
> planned....

Never does...

> I initially could ping perfectly from either of the two IP's on the xen
> box to any of the 8 IP's on the san1, even ping -f worked perfectly.
> Whatever I did, I couldn't get a iscsiadm .... discover to work... I
> could see the packets being sent from the san box (tcpdump) but never
> received by the xen box.

I'm pretty sure I know what most, if not all, of the problem is here.
For this iSCSI/multipath setup to work with all the ethernet ports
(clients and server) on a single subnet, you have to configure source
routing.  Otherwise the Linux kernel is going to use a single interface
for all outbound IP packets destined for the subnet.  So, you have two
options:

1.  Keep a single subnet and configure source routing
2.  Switch to using 8 unique subnets, one per server port

With more than two iSCSI target IPs/ports on the server, using unique
subnets on each port will be a PITA to configure on the Xen client
machines, as you'll have to bind 8 different addresses to each ethernet
port.  And keeping track of how you've setup 8 different subnets will be
a PITA.  So assuming you already have all the interfaces on a single
subnet, source routing is probably much easier.  I believe this how we
do it.

I don't know your port or IP info so I'm using fictitious values in this
example how-to, and subnet 192.168.101.0/24.

Let's start with the iSCSI target server, san1.  First, you probably
need to revert the arp changes you made back to their original values.
The changes you made earlier, according to your email, were:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce

Next enable arp_filter on all 8 SAN ports:

~$ echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter
......
~$ echo 1 > /proc/sys/net/ipv4/conf/eth7/arp_filter

Then create 8 table entries with names, such as port_0 thru port_7:

~$ echo 100 port_0 >> /etc/iproute2/rt_tables
......
~$ echo 101 port_7 >> /etc/iproute2/rt_tables

Next add the route table for your 8 interfaces.

~$ ip route add 192.168.101.0/24 dev eth0 src 192.168.101.0 table port_0
......
~$ ip route add 192.168.101.0/24 dev eth7 src 192.168.101.7 table port_7

Now create the source policy rules:

~$ ip rule add from 192.168.101.0 table port_0
......
~$ ip rule add from 192.168.101.7 table port_7

Now we flush the routing table cache to make the new policy active:

~$ ip route flush cache

If I have this right, now all packets from a given IP address will be
sent out the interface to which the IP is bound.

Now you need to make these same changes for the two SAN ports on each
Xen box.  Obviously start with one box and then test it before doing the
others.

This should get iscsiadm working and seeing all of the LUNs on all 8
ports on san1, and dm-multipath should work.  If it turns out that
dm-multipath doesn't fan across all 8 remote interfaces, you'll need to
manually set each Xen box to hit a specific pair of ports on san1, two
Xen boxen per pair of san1 ports.  Set it up so the Xen pairs have one
port on each quad port NIC, for redundancy.  It doesn't really make a
different whether dm-multipath fans over all 8 LUNs because you have
only 200MB/s per Xen client anyway.  That's 1.6GB/s client bandwidth and
800MB/s server.  So as long as you have port and path redundancy, two
LUN connections per client is as good as 8.  I've actually never seen a
SAN setup with clients logging into more than two head ports.

Most configurations such as this use multiple switches.  So the switch
may still give us problems.  If so we'll have to figure out an
appropriate multiple VLAN setup.  And do all of the above with standard
frame size.  If/when it's working try larger MTU.

> Eventually I pulled the disabled all except one ethernet device on both
> machines, still no luck. 

After so much reconfiguration it's hard to tell what all was going wrong
at this point.

> Finally, out of desperation I pulled the cables
> from both machines, dropped in a direct cable (ie, bypass the nice shiny
> new switch), and discover worked immediately. So I tried with the old
> switch, but same problem, so I've now connected each xen box direct to
> san1 ethernet port, so they now all get a dedicated 1 Gbps port each.

If the source routing config above doesn't immediately work, or if you
get full bandwidth out to the Xen hosts, but only half into to san1, you
may need to create 2 isolated VLANs, put two ports of each quad NIC in
each, and one port of each Xen box in each VLAN.

> I think the problem with the switch is that I didn't configure it
> properly to support the 9000 MTU, or something like that, which now
> makes more sense that lots of small packets are fine (not faulty cables,
> network cards, switches, etc) but big packets fail (like the response to
> a DiscoveryAll packet).

You may have simply confused it with all the link plugging and chugging.
 In the past I've seen odd things like switches holding onto a MAC on
port1 ten minutes after I pulled the server from port1 and plugged it
into port10, forcing me to reboot or power cycle the switch to clear the
MAC table.  Other switches handle this with aplomb.  It's been many
years since I've seen that though, and it was a low end model.

> Anyway, all systems are online, and I think I will leave things as is
> for now.

The fact that it's working well enough (and far better than previously),
even if not yet perfected, is the most important part. :)  The client
isn't screaming anymore.

Worth noting is with direct connection you eliminate the switch latency,
increasing throughput.  Though you need to get this all working through
a switch, with both links for redundancy, and so you can expand with
more Xen hosts if needed.  Right now you're out of server ports.  And
you're probably close to exhausting the PCIe slots in san1.

> What I have accomplished:
> 1) All systems should be using dedicated 1Gbps for iSCSI and 1Gbps for
> everything else
> 2) All hardware is physically installed

> What I think I need next time
> 1) 10 x colour coded 2m cables (management/user LAN ports), probably
> blue to match all the rest of the user cabling
> 2) 8 cables in green (port 1 xen)
> 3) 8 cables in yellow (port 2 xen)
> 4) 8 cables in white (4 each for san1/san2 on 1st card)
> 5) 8 cables in grey (4 each for san1/san2 on 2nd card)
> 6) Lotsa cable ties to keep each bundle together

There's your problem.  No orange. ;)

(most LC multimode fiber SAN cables are orange)

> Don't really know what colour cables are available, or even sure if it
> is such a good idea to use so many different colours.... Another option
> would be to stick with two colours, one for the iSCSI SAN network, and
> the second colour for the user LAN. Just makes it hard trying to work
> out which port/machine the other end of this random cable is connected
> to.....

Two colors for SAN: one for Xen boxen, one for servers.  Label each
cable end with it's respective switch or host port assignment.  One inch
printer labels work well as they stick to the cable and themselves, so
well you have to cut them off.  I think somebody sells something fancier
but why bother, as long as you can read your own handwriting.  Label the
Intel NIC ports if they aren't numbered.  That's how I normally do it.

> Anyway, monitoring systems say everything is ok, testing says it's
> working, so I'm off home. No pictures yet, so messy it's embarrassing,

*ALL* racks/closets are messy.  It's only environments where folks are
under worked and overpaid that everything is tidy: govt, uni, big corp.
 Nobody else has time.  And if you're a VAR/consultant paid by the hour,
clients don't give a crap about looks, as long as it works.  They don't,
or rarely, go into the server room, closet, etc anyway.

> and it isn't even working properly. Hopefully when I'm finished it will
> be worth a picture or two :)

You'll get there before long.  The final configuration may not be
exactly what you envisioned, but I guarantee your overall goals will be
met soon.  You've doubled your bandwidth by isolating user/san traffic
so you're half way there already.

-- 
Stan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html