Re: [Ceph-community] Why MON,MDS,MGR are on Public network?

Tony Liu <tonyliu0592@xxxxxxxxxxx> · Mon, 29 Nov 2021 22:44:35 +0000

Is there any measurement about how much bandwidth will be taken by
private traffic vs. public/client traffic when they are on the same network?
I am currently having two 2x10G bondings for public and private, the intention
is to provide 2x10G bandwidth for clients. I do understand the overhead caused
by more networks, but I think it's more critical to guarantee client bandwidth,
specially when there is more private traffic during maintenance, rebalance, etc.

Thanks!
Tony
________________________________________
From: Anthony D'Atri <anthony.datri@xxxxxxxxx>
Sent: November 29, 2021 02:14 PM
To: ceph-users@xxxxxxxx
Subject:  Re: [Ceph-community] Why MON,MDS,MGR are on Public network?

>
>> I don't trust the public network and afraid of if mons goes down due to this problem? So to be more secure and faster I need to understand the reason; 3- Why Mon,Mds,Mgr >should be
>>  on public network?
>

Remember that the clients need to reach the mons and any MDS the cluster has.  It is not unusual for a separate replication/private/backend network to not have a default route or otherwise be unreachable from non-OSD nodes.

> The idea to separate OSD<->OSD traffic probably comes from the fact
> that replication means data gets multiplied over the network, so if a
> client writes 1G data to a pool with replication=3, then two more
> copies of that 1G needs to be sent, and if you do that on the "public"
> network, you might starve it with replication (or repair/backfill)
> traffic.

Indeed, one of the rationales is to prevent client/mon traffic and OSD replication traffic from DoSing each other.

> Many run with only one network, using as fast a network as you can
> afford, but if two separate networks at moderate speed is cheaper than
> one super fast, it might be worth considering, otherwise just scale
> the one single network to your needs.

Notably sometimes we see nodes with only two network ports.  One could run separate public/client and private/replication networks, without redundancy — or use bonding / EQR for redundancy but no dedicated replication network.

The two-network strategy dates from a time when 1 Gb/s networking was common and 10 Gb/s was cutting edge.  With today’s faster networks and Ceph’s multiple improvements in recovery/backfill, the equation and tradeoffs are different from where they were ten years ago.  Ceph is pretty good these days at detecting when an entire node is down, and with scrub randomization, reporters settings and a wise mon_osd_down_out_subtree_limit setting, thundering herds of backfill/recovery are much less of a problem than they used to be.

Switches, patch panels, crossconnects take up RUs and cost OpEx.  Sometimes the RUs saved by not having two networks means you can fit another node or two into each rack.

Having a replication network can result in certain flapping situations that can be tricky to troubleshoot, that’s what finally led me to embrace the single network architecture.  ymmv.

Also when you have five network connections to a given node, it’s super easy during maintenance to not get them plugged back in correctly, no matter how laboriously one labels the cables.  (dual public, dual private, BMC/netmgt).  Admittedly this probably isn’t a gating factor, but it still happens ;)

— aad

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx