If it's OK I'm going to snip a bunch of this and get to the meat of it, so hopefully it's less confusing. On 2/10/2013 10:16 AM, Adam Goryachev wrote: ... ... > The problem is that (from my understanding) LACP will balance the traffic based on the destination MAC address, by default. So the bandwidth between any two machines is limited to a single 1Gbps link. So regardless of the number of ethernet ports on the DC box, it will only ever use a max of 1Gb[s to talk to the iSCSI server. > However, if I configure Linux to use xmit_hash_policy=1 it will use the IP address and port (layer 3+4) to decide which trunk to use. It will still only use 1Gbps to talk to that IP:port combination. That is correct. Long story short, the last time I messed with a configuration such as this I was using a Cisco that fanned over 802.3ad groups based on L3/4 info. Stock 802.3ad won't do this. I apologize for the confusion, and for the delay in responding (twas a weekend after all). I just finished reading the relevant section of your GS716T-200 (GST716-v2) manual, and it does not appear to have this capability. All is not lost. I've done a considerable amount of analysis of all the information you've provided. In fact I've spent way to much time on this. But it's an intriguing problem involving interesting systems assembled from channel parts, i.e. "DIY", and I couldn't put it down. I was hoping to come up with a long term solution that didn't require any more hardware than a NIC and HBA, but that's just not really feasible. So, my conclusions and recommendations, based on all the information I have to date: 1. Channel bonding via a single switch using standard link aggregation protocols cannot scale iSCSI throughput between two hosts. The various Linux packet fanning modes don't work well here either for scaling both transmit and receive traffic. 2. To scale iSCSI throughput using a single switch will require multiple host ports and MPIO, but no LAG for these ports. 3. Given the facts above, an extra port could be added to each TS Xen box. A separate subnet would be created for the iSCSI SAN traffic, and each port given an IP in the subnet. Both ports would carry MPIO iSCSI packets, but only one port would carry user traffic. 4. Given the fact that there will almost certainly be TS users on the target box when the DC VM gets migrated due to some kind of failure or maintenance, adding the load of file sharing may not prove desirable. And you'd need another switch. Thus, I'd recommend: A. Dedicate the DC Xen box as a file server and dedicate a non-TS Xen box as its failover partner. Each machine will receive a quad port NIC. Two ports on each host will be connected to the current 16 port switch. The two ports will be configured to balance-alb using the current user network IP address. All switch ports will be reconfigured to standard mode, no LAGs, as they are not needed for Linux balance-alb. Disconnect the 8111 mobo ports on these two boxes from the switch as they're no longer needed. Prioritize RDP in the switch, leave all other protocols alone. B. We remove 4 links each from the iSCSI servers, the primary and the DRBD backup server, from the switch. This frees up 8 ports for connecting the file servers' 4 ports, and connecting a motherboard ethernet port from each iSCSI server to the switch for management. If my math is correct this should leave two ports free. C. MPIO is designed specifically for IO scaling, and works well. So it's a better fit, and you save the cost of the additional switch(es) that would be required to do perfect balance-rr bonding between iSCSI hosts (which can be done easily with each host ethernet port connected to a different dedicated SAN switch. In this case it would require 4 additional switches. Instead what we'll do here is connect the remaining 2 ports from each Xen file server box, the primary and the backup, and all 4 ports on each iSCSI server, the primary and the backup, to a new 12-16 port switch. It can be any cheap unmanaged GbE switch of 12 or more ports. We'll assign an IP address in the new SAN subnet to each physical port on these 4 boxes and configure MPIO accordingly. So what we end up with is decent session based scaling of user CIFS traffic between the TS hosts and the DC Xen servers, with no single TS host bogging everyone down, and no desktop lag if both links are full due to two greedy users. We end up with nearly perfect ~200MB/s iSCSI scaling in both directions between the DC Xen box (and/or backup) and the iSCSI servers, and we end up with nearly perfect ~400MB/s each way between the two iSCSI servers via DRBD, allowing you to easily do mirroring in real-time. All for the cost of two quad port NICs and an inexpensive switch, and possibly a new high performance SAS HBA. I analyzed many possible paths to a solution, and I think this one is probably close to ideal. You can pull off the same basic concept buying just the quad port HBA for the current DC Xen box, removing 2 links between each iSCSI server and the switch and direct connecting these 4 NIC ports via 2 cross over cables, and using yet another IP subnet for these, with MPIO. You'd have no failover for the DC, and the bandwidth between the iSCSI servers for BRBD would be cut in half. But it only costs one quad port NIC. A dedicated 200MB/s is probably more than plenty for live DRBD, but again you have no DC failover. However, given that you've designed this system with "redundancy everywhere" in mind, I'm guessing the additional redundancy justifies the capital outlay for an unmanaged switch and a 2nd quad port NIC. <BIG snip> > So, given the above, would you still suggest only adding a 4port ethernet to the DC box configured with LACP, or should I really look at something else. I think LACP is out, regardless of transmit hash mode. If one of those test boxes could be permanently deployed as the failover host for the DC VM, I think the dedicated iSCSI switch architecture makes the most sense long term. If the cost of the switch and another 4 port NIC isn't in the cards right now, you can go the other route with just one new NIC. And given that you'll be doing no ethernet channel bonding on the iSCSI network, but IP based MPIO instead, it's a snap to convert to the redundant architecture with new switch later. All you'll be doing is swapping cables to the new switch and changing IP address bindings on the NICs as needed. Again, apologies for the false start with the 802.3ad confusion on my part. I think you'll find all (or at least most) of the ducks in a row in the recommendations above. -- Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html