Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons on
dedicated hosts, and the mons typically use a few percent of a core, and
generate about 50Mbits/sec network traffic. They are connected at
20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 core servers.
We recently had to shut ceph down completely for maintenance (which we
rarely do), and had significant difficulties starting it up. The
symptoms included OSDs hanging on startup, being marked down, flapping
and all that bad stuff. After some investigation we found that the
20Gbit/sec network interfaces of the monitors were completely saturated
as the OSDs were starting, while the monitor processes were using about
3 cores (300% CPU). We ended up having to start the OSDs up super slow
to make sure that the monitors could keep up - it took about 4 hours to
start 3500 OSDs (at a rate about 4 seconds per OSD). We've tried
setting noout and nodown, but that didn't really help either. A few
questions that would be good to understand in order to move to a better
configuration.
1. How does the monitor traffic scale with the number of OSDs?
Presumably the traffic comes from distributing cluster maps as the
cluster changes on OSD starts. The cluster map is perhaps O(N) for N
OSDs, and each OSD needs an update on a cluster change so that would
make one change an O(N^2) traffic. As OSDs start, the cluster changes
quite a lot (N times?), so would that make the startup traffic O(N^3)?
If so, that sounds pretty scary for scalability.
2. Would adding more monitors help here? I.e. presumably each OSD gets
its maps from one monitor, so they would share the traffic. Would the
inter-monitor communication/elections/etc. be problematic for more
monitors (5, 7 or even more)? Would more monitors be recommended? If
so, how many is practical?
3. Are there any config parameters useful for tuning the traffic
(perhaps send mon updates less frequently, or something along those lines)?
Any other advice on this topic would also be helpful.
Thanks,
Andras
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com