Re: Network design issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/15/21 5:38 PM, Frank Schilder wrote:
Hi Stefan,

I think you gave me the right pointers.

Last summer I was looking up exactly this, how do Dell switches hash connections onto members of a LAG. What I found was, that the only option was by MAC. I did a test with iperf using several connections between the same two servers, or from one to many. This test confirmed what I found in the documentation, all connections between 2 servers shared a single 10G member, while one-to-many connections were distributed over multiple members. Back then, I thought this was it and didn't look into this further.

Now, after your hints, I went back to the manual and find that the switches actually do support more advanced hash functions - at least after enabling ECMP. By default it is disabled. I'm not sure if I was reading a manual for the wrong switch family, no idea where I found "MAC only" statement. I got in touch with Dell support to help me here, the manual on load balancing is not exactly great.

I can use MACs, IP, port, VLAN ID and a few other packet fields for hashing. I hope not only in layer 3 routing. In particular, including the VLAN ID should help spreading client and replication traffic out a bit better. And Dell also supports defining salts to avoid polarisation, which I believe is hurting us as well at the moment.

I have one last question. The Dell manual states that one can enable monitoring of load balancing and it will check every 15secs for imbalance across the members of a LAG. You wrote "... and with OVS you can balance the load between the LACP links (by default it evaluates every 10 seconds if it should move flows around)." How is this done? The hash function doesn't change, so how can port mappings be re-arranged in a predictable way? The Dell switches will only create log events, nothing more. The Dell manual uses the term "dynamic load balancing", but generating log messages is not really the same. Am  missing something?

When the workload is perfectly static, nothing changes. But that will hardly every be the case. Here the info for OVS on this:

"Every 10 seconds, vswitchd rebalances the bond members (see bond_rebalance()). To rebalance, vswitchd examines the statistics for the number of bytes transmitted by each member over approximately the past minute, with data sent more recently weighted more heavily than data sent less recently. It considers each of the members in order from most-loaded to least-loaded. If highly loaded member H is significantly more heavily loaded than the least-loaded member L, and member H carries at least two hashes, then vswitchd shifts one of H’s hashes to L. However, vswitchd will only shift a hash from H to L if it will decrease the ratio of the load between H and L by at least 0.1.

Currently, “significantly more loaded” means that H must carry at least 1 Mbps more traffic, and that traffic must be at least 3% greater than L’s."

So if it makes sense to move one or more flows on other links, it will do so.

I guess the Dell switches will do something similar.


For us, I think a bit more clever hashing and, maybe, higher priority for the replication VLAN will do. As far as I can see, our cluster is essentially running on 10G internally and anything better than that should do and be easy to achieve.

Thanks for putting me on the right track.

Good to hear, I hope you manage to solve it.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux