Re: Ceph monitors overloaded on large cluster restart

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Wed, 19 Dec 2018 17:38:09 -0500

Forgot to mention: all nodes are on Luminous 12.2.8 currently on CentOS 7.5.

On 12/19/18 5:34 PM, Andras Pataki wrote:
Dear ceph users,

We have a large-ish ceph cluster with about 3500 osds.  We run 3 mons 
on dedicated hosts, and the mons typically use a few percent of a 
core, and generate about 50Mbits/sec network traffic.  They are 
connected at 20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 
core servers.

We recently had to shut ceph down completely for maintenance (which we 
rarely do), and had significant difficulties starting it up.  The 
symptoms included OSDs hanging on startup, being marked down, flapping 
and all that bad stuff.  After some investigation we found that the 
20Gbit/sec network interfaces of the monitors were completely 
saturated as the OSDs were starting, while the monitor processes were 
using about 3 cores (300% CPU).  We ended up having to start the OSDs 
up super slow to make sure that the monitors could keep up - it took 
about 4 hours to start 3500 OSDs (at a rate about 4 seconds per OSD).  
We've tried setting noout and nodown, but that didn't really help 
either.  A few questions that would be good to understand in order to 
move to a better configuration.

1. How does the monitor traffic scale with the number of OSDs? 
Presumably the traffic comes from distributing cluster maps as the 
cluster changes on OSD starts.  The cluster map is perhaps O(N) for N 
OSDs, and each OSD needs an update on a cluster change so that would 
make one change an O(N^2) traffic.  As OSDs start, the cluster changes 
quite a lot (N times?), so would that make the startup traffic 
O(N^3)?  If so, that sounds pretty scary for scalability.

2. Would adding more monitors help here?  I.e. presumably each OSD 
gets its maps from one monitor, so they would share the traffic. Would 
the inter-monitor communication/elections/etc. be problematic for more 
monitors (5, 7 or even more)?  Would more monitors be recommended?  If 
so, how many is practical?

3. Are there any config parameters useful for tuning the traffic 
(perhaps send mon updates less frequently, or something along those 
lines)?

Any other advice on this topic would also be helpful.

Thanks,

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com