Ceph monitors overloaded on large cluster restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear ceph users,

We have a large-ish ceph cluster with about 3500 osds.  We run 3 mons on dedicated hosts, and the mons typically use a few percent of a core, and generate about 50Mbits/sec network traffic.  They are connected at 20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 core servers.

We recently had to shut ceph down completely for maintenance (which we rarely do), and had significant difficulties starting it up.  The symptoms included OSDs hanging on startup, being marked down, flapping and all that bad stuff.  After some investigation we found that the 20Gbit/sec network interfaces of the monitors were completely saturated as the OSDs were starting, while the monitor processes were using about 3 cores (300% CPU).  We ended up having to start the OSDs up super slow to make sure that the monitors could keep up - it took about 4 hours to start 3500 OSDs (at a rate about 4 seconds per OSD).  We've tried setting noout and nodown, but that didn't really help either.  A few questions that would be good to understand in order to move to a better configuration.

1. How does the monitor traffic scale with the number of OSDs? Presumably the traffic comes from distributing cluster maps as the cluster changes on OSD starts.  The cluster map is perhaps O(N) for N OSDs, and each OSD needs an update on a cluster change so that would make one change an O(N^2) traffic.  As OSDs start, the cluster changes quite a lot (N times?), so would that make the startup traffic O(N^3)?  If so, that sounds pretty scary for scalability.

2. Would adding more monitors help here?  I.e. presumably each OSD gets its maps from one monitor, so they would share the traffic. Would the inter-monitor communication/elections/etc. be problematic for more monitors (5, 7 or even more)?  Would more monitors be recommended?  If so, how many is practical?

3. Are there any config parameters useful for tuning the traffic (perhaps send mon updates less frequently, or something along those lines)?

Any other advice on this topic would also be helpful.

Thanks,

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux