Re: Nic bonding (lacp) settings for ceph

Andrew Walker-Brown <andrew_jbrown@xxxxxxxxxxx> · Mon, 28 Jun 2021 18:37:54 +0000

HI,

I think ad_select is only relevant in the scenario below I.e where you have more than one port-channel being presented to the Linux bond.  So below, you have 2 port channels, one from each switch, but at the Linux side all the ports involved are slaves in the same bond.  In your scenario it sounds like you just have one switch with one port-channel to one bond on Linux.  So in the case of ad_select, I doubt it has any impact.  The main thing will be the xmit-hash-policy on both the switches and Linux.  FWIW, I use layer3+4 on Linux and something very close to that on my S series switches, and both 10G links get used pretty well.  (below was lifted from a stackexchange thread)

.-----------.   .-----------.

|  Switch1  |   |  Switch2  |

'-=-------=-'   '-=-------=-'

  |       |       |       |

  |       |       |       |

.-=----.--=---.---=--.----=-.

| eth0 | eth1 | eth2 | eth3 |

|---------------------------|

|           bond0           |

'---------------------------'

Where each switch has its two ports configured in a PortChannel, the Linux end with the LACP bond will negotiate two Aggregator IDs:

Aggregator ID 1

 - eth0 and eth1

Aggregator ID 2

 - eth2 and eth3

And the switches will have a view completely separate of each other.

Switch 1 will think:

Switch 1

 PortChannel 1

 - port X

 - port Y

Switch 2 will think:

Switch 2

 PortChannel 1

 - port X

 - port Y

>From the Linux system with the bond, only one Aggregator will be used at a given time, and will fail over depending on ad_select.

So assuming Aggregator ID 1 is in use, and you pull eth0's cable out, the default behaviour is to stay on Aggregator ID 1.

However, Aggregator ID 1 only has 1 cable, and there's a spare Aggregator ID 2 with 2 cables - twice the bandwidth!

If you use ad_select=count or ad_select=bandwidth, the active Aggregator ID fails over to an Aggregator with the most cables or the most bandwidth.

Note that LACP mandates an Aggregator's ports must all be the same speed and duplex, so I believe you could configure one Aggregator with 1Gbps ports and one Aggregator with 10Gbps ports, and have intelligent selection depending on whether you have 20/10/2/1Gbps available.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: mhnx<mailto:morphinwithyou@xxxxxxxxx>
Sent: 28 June 2021 18:46
To: Marc 'risson' Schmitt<mailto:risson@xxxxxxxxxxxx>
Cc: Ceph Users<mailto:ceph-users@xxxxxxx>
Subject:  Re: Nic bonding (lacp) settings for ceph

Thanks for the answer.
I'm into ad_select bandwitdh because we use osd nodes as rgw gateways, VMs
and different applications.

I have seperate cluster (10+10Gbe) and public (10+10Gbe) network.
I tested stable, bandwitdh and count. Results are clearly good with
bandwitdh. Count is the worst option.
But I wonder if bandwitdh calculation has any effect on the network delay?
If it is then I will return to stable. I don't know now but when i think
about it if every time bonding driver needs to calculate bandwitdh and
decide it should add some cpu power and delay. If it has no effect then
bandwitdh will improve distribution better.

Now I know that I have to use 3+4 but still couldn't decide on ad_select.
Bandwitdh or stable?
Can we discuss it please?

28 Haz 2021 Pzt 20:15 tarihinde Marc 'risson' Schmitt <risson@xxxxxxxxxxxx>
şunu yazdı:

> Hi,
>
> On Sat, 26 Jun 2021 16:47:19 +0300
> mhnx <morphinwithyou@xxxxxxxxx> wrote:
> > I've changed ad_select to bandwitdh and both nic is in use now but
> > layer2 hash prevents dual nic usage for between two nodes (because
> > layer2 using only Mac ).
>
> As I understand it, setting ad_select to bandwidth is only going to be
> useful if you have several link aggregates in the same bond, like when
> you are connected in LACP to multiple (non-stacked) switches.
>
> > People advice using layer2+3 for best performance but it has no
> > effect on osds because mac and ip is the same.
> > I've tried layer3+4 to split by ports instead mac and it works. But i
> > dont know what will the effect and also my switch is layer2.
>
> We are setting layer3+4 on both our servers and our switches.
>
> Regards,
>
> --
> Marc 'risson' Schmitt
> CRI - EPITA
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx