Re: Nic bonding (lacp) settings for ceph

mhnx <morphinwithyou@xxxxxxxxx> · Mon, 28 Jun 2021 22:35:36 +0300

To be clear.
I have stacked switch and this is my configuration.

Bonding cluster: (hash 3+4)
Cluster nic1(10Gbe) -> Switch A
Cluster nic2(10Gbe) -> Switch B

Bonding public: (hash 3+4)
Public  nic1(10Gbe) -> Switch A
Public  nic2(10Gbe) -> Switch B

Data distribution wasn't good at the begining due to layer2 bonding. With
the hash3+4 its better now.

But when I test the network with "iperf -parallel 2" and "ad_select=stable"
Sometimes it uses both nic, sometimes it uses only one nic.
After that i changed "ad_select=bandwitdh" and data distribution was
looking better. Every iperf test was successfull and also when one port has
some data going on, the next request always used the free port.
And that's why I'm digging it. If it doesn't have any bad side or overhead
then test winner is bandwitdh in my tests. I will share the test Results in
my next mail. PS: How should I test latency?

I'm not network expert. I'm just trying to understand the concept. My
switch is layer2+3 TOR switch. I use active-active standart port-channel
settings. I Wonder that If i dont change switch side to 3+4, what will be
the effect on the rest?
 I think TX will share both nic but RX always will be use one nic due to
switch hash algorithm is differ but its just a guess.

28 Haz 2021 Pzt 21:38 tarihinde Andrew Walker-Brown <
andrew_jbrown@xxxxxxxxxxx> şunu yazdı:

> HI,
>
>
>
> I think ad_select is only relevant in the scenario below I.e where you
> have more than one port-channel being presented to the Linux bond.  So
> below, you have 2 port channels, one from each switch, but at the Linux
> side all the ports involved are slaves in the same bond.  In your scenario
> it sounds like you just have one switch with one port-channel to one bond
> on Linux.  So in the case of ad_select, I doubt it has any impact.  The
> main thing will be the xmit-hash-policy on both the switches and Linux.
> FWIW, I use layer3+4 on Linux and something very close to that on my S
> series switches, and both 10G links get used pretty well.  (below was
> lifted from a stackexchange thread)
>
>
>
> .-----------.   .-----------.
>
> |  Switch1  |   |  Switch2  |
>
> '-=-------=-'   '-=-------=-'
>
>   |       |       |       |
>
>   |       |       |       |
>
> .-=----.--=---.---=--.----=-.
>
> | eth0 | eth1 | eth2 | eth3 |
>
> |---------------------------|
>
> |           bond0           |
>
> '---------------------------'
>
> Where each switch has its two ports configured in a PortChannel, the
> Linux end with the LACP bond will negotiate two Aggregator IDs:
>
> Aggregator ID 1
>
>  - eth0 and eth1
>
>
>
> Aggregator ID 2
>
>  - eth2 and eth3
>
> And the switches will have a view completely separate of each other.
>
> Switch 1 will think:
>
> Switch 1
>
>  PortChannel 1
>
>  - port X
>
>  - port Y
>
> Switch 2 will think:
>
> Switch 2
>
>  PortChannel 1
>
>  - port X
>
>  - port Y
>
> From the Linux system with the bond, only one Aggregator will be used at a
> given time, and will fail over depending on ad_select.
>
> So assuming Aggregator ID 1 is in use, and you pull eth0's cable out, the
> default behaviour is to stay on Aggregator ID 1.
>
> However, Aggregator ID 1 only has 1 cable, and there's a spare Aggregator
> ID 2 with 2 cables - twice the bandwidth!
>
> If you use ad_select=count or ad_select=bandwidth, the active Aggregator
> ID fails over to an Aggregator with the most cables or the most bandwidth.
>
> Note that LACP mandates an Aggregator's ports must all be the same speed
> and duplex, so I believe you could configure one Aggregator with 1Gbps
> ports and one Aggregator with 10Gbps ports, and have intelligent selection
> depending on whether you have 20/10/2/1Gbps available.
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> *From: *mhnx <morphinwithyou@xxxxxxxxx>
> *Sent: *28 June 2021 18:46
> *To: *Marc 'risson' Schmitt <risson@xxxxxxxxxxxx>
> *Cc: *Ceph Users <ceph-users@xxxxxxx>
> *Subject: * Re: Nic bonding (lacp) settings for ceph
>
>
>
> Thanks for the answer.
> I'm into ad_select bandwitdh because we use osd nodes as rgw gateways, VMs
> and different applications.
>
> I have seperate cluster (10+10Gbe) and public (10+10Gbe) network.
> I tested stable, bandwitdh and count. Results are clearly good with
> bandwitdh. Count is the worst option.
> But I wonder if bandwitdh calculation has any effect on the network delay?
> If it is then I will return to stable. I don't know now but when i think
> about it if every time bonding driver needs to calculate bandwitdh and
> decide it should add some cpu power and delay. If it has no effect then
> bandwitdh will improve distribution better.
>
> Now I know that I have to use 3+4 but still couldn't decide on ad_select.
> Bandwitdh or stable?
> Can we discuss it please?
>
> 28 Haz 2021 Pzt 20:15 tarihinde Marc 'risson' Schmitt <risson@xxxxxxxxxxxx
> >
> şunu yazdı:
>
> > Hi,
> >
> > On Sat, 26 Jun 2021 16:47:19 +0300
> > mhnx <morphinwithyou@xxxxxxxxx> wrote:
> > > I've changed ad_select to bandwitdh and both nic is in use now but
> > > layer2 hash prevents dual nic usage for between two nodes (because
> > > layer2 using only Mac ).
> >
> > As I understand it, setting ad_select to bandwidth is only going to be
> > useful if you have several link aggregates in the same bond, like when
> > you are connected in LACP to multiple (non-stacked) switches.
> >
> > > People advice using layer2+3 for best performance but it has no
> > > effect on osds because mac and ip is the same.
> > > I've tried layer3+4 to split by ports instead mac and it works. But i
> > > dont know what will the effect and also my switch is layer2.
> >
> > We are setting layer3+4 on both our servers and our switches.
> >
> > Regards,
> >
> > --
> > Marc 'risson' Schmitt
> > CRI - EPITA
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx