Re: Networking Idea/Question

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Mon, 15 Mar 2021 22:40:52 -0400

Andrew,

I agree that the choice of hash function is important for LACP. My 
thinking has always been to stay down in layers 2 and 3.  With enough 
hosts it seems likely that traffic would be split close to evenly.  
Heads or tails - 50% of the time you're right.  TCP ports should also be 
nearly equally split, but listening ports could introduce some asymmetry.

What I'm concerned about is the next level up:  With the client network 
and the cluster network (Marc's terms are more descriptive) on the same 
NICs/Switch Ports, with or without LACP and LAGs, it seems possible that 
at times the bandwidth consumed by cluster traffic could overwhelm and 
starve the client traffic. Or the other way around, which would be worse 
if the cluster nodes can't communicate on their 'private' network to 
keep the cluster consistent.  These overloads could  happen in the 
packet queues in the NIC drivers, or maybe in the switch fabric.

Maybe these starvation scenarios aren't that likely in clusters with 
10GB networking.  Maybe it's hard to fill up a 10GB pipe, much less 
two.  But it could happen with 1GB NICs, even in LAGs of 4 or 6 ports, 
and eventually it will be possible with faster NVMe drives to easily 
fill a 10GB pipe.

So, what could we do with some of the 'exotic' queuing mechanisms 
available in Linux to keep the balance - to assure that the lesser 
category can transmit proportionally?  (And is 'proportional' the right 
answer, or should one side get a slight advantage?)

-Dave

Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
On 3/15/2021 12:48 PM, Andrew Walker-Brown wrote:

Dave

That’s the way our cluster is setup. It’s relatively small, 5 hosts, 12 osd’s.

Each host has 2x10G with LACP to the switches.  We’ve vlan’d public/private networks.

Making best use of the LACP lag will to a greater extent be down to choosing the best hashing policy.  At the moment we’re using layer3+4 on the Linux config and switch configs.  We’re monitoring link utilisation to make sure the balancing is as close to equal as possible.

Hope this helps

A

Sent from my iPhone

On 15 Mar 2021, at 16:39, Marc <Marc@xxxxxxxxxxxxxxxxx> wrote:

I have client and cluster network on one 10gbit port (with different vlans).
I think many smaller clusters do this ;)

I've been thinking about ways to squeeze as much performance as possible
from the NICs  on a Ceph OSD node.  The nodes in our cluster (6 x OSD, 3
x MGR/MON/MDS/RGW) currently have 2 x 10GB ports.  Currently, one port
is assigned to the front-side network, and one to the back-side
network.  However, there are times when the traffic on one side or the
other is more intense and might benefit from a bit more bandwidth.

The idea I had was to bond the two ports together, and to run the
back-side network in a tagged VLAN on the combined 20GB LACP port.  In
order to keep the balance and prevent starvation from either side it
would be necessary to apply some sort of a weighted fair queuing
mechanism via the 'tc' command.  The idea is that if the client side
isn't using up the full 10GB/node, and there is a burst of re-balancing
activity, the bandwidth consumed by the back-side traffic could swell to
15GB or more.   Or vice versa.

 From what I have read and studied, these algorithms are fairly
responsive to changes in load and would thus adjust rapidly if the
demand from either side suddenly changed.

Maybe this is a crazy idea, or maybe it's really cool.  Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx